Do Open Wards increase turnout?

I’m a committeeperson in the 24th Ward, which serves West Philly’s Mantua, Parkside, and Powelton neighborhoods. In May 2022, we voted to become an Open Ward. What does that mean? There’s no single definition, and each Open Ward establishes its own bylaws. But at a high level, being “Open” means that Committeepeople vote on which candidates to endorse and on other Ward operations, rather than simply following the Ward Leader. In addition to being more democratic, we also hope that by empowering committeepeople in this way, we can increase participation and Ward capacity for Get Out the Vote and other efforts.

Many of these benefits are fuzzy, but an increase in voter participation should be measurable. So, do we have any evidence that this works? Do Open Wards increase turnout? A number of wards have been open for years (some even for decades). Today, I’m doing some statistical analyses to measure if that’s true.

First, what wards are “Open”? As I said, there’s no single definition. But for this analysis, I’m using the set of Wards that have bylaws allowing committeepeople to vote on endorsements. Those I’m considering Open are:

Ward Year Opened
1 2018
2 2018
5 60s or 70s
8 60s or 70s
9 60s or 70s
15 2022
18 2018
24 2022
27 60s or 70s
30 1998
39a 2022

(If you want to contest these definitions or years, my email is open.)

View code
library(sf)
library(dplyr)
library(tidyr)
library(ggplot2)
source("../../admin_scripts/theme_sixtysix.R")
source("../../admin_scripts/util.R")
asnum <- function(x) as.numeric(as.character(x))

divs <- st_read("../../data/gis/warddivs/202011/Political_Divisions.shp")ward_from_div <- function(warddiv){
  ward <- substr(warddiv, 1, 2)
  div <- substr(warddiv, 4,5) 
  ward <- case_when(
    ward == "39" & asnum(div) >= 25 ~ "39a",
    ward == "39" & asnum(div) < 25 ~ "39b",
    TRUE ~ ward
  )
  return(ward)
}

divs <- divs |>
  mutate(
    warddiv = pretty_div(DIVISION_N),
    ward = ward_from_div(warddiv)
  )

wards <- divs |> group_by(ward) |> summarise() |> st_make_valid()

open_wards <- tribble(
  ~ward, ~year_opened,
  "01", "2018",
  "02", "2018",
  "05", "60s or 70s",
  "08", "60s or 70s",
  "09", "60s or 70s",
  "15", "2022",
  "18", "2018",
  "24", "2022",
  "27", "60s or 70s",
  "30", "1998",
  "39a", "2022"
)

wards <- wards |> mutate(
  is_open = ward %in% open_wards$ward,
  is_open_pre_22 = ward %in% (
    open_wards |> filter(year_opened != "2022") |> with(ward)
  )
)

ggplot(wards) + 
  geom_sf(aes(fill = is_open)) +
  scale_fill_manual(
    values=c(`TRUE` = light_blue, `FALSE` = light_grey),
    guide=FALSE
  ) +
  geom_text(
    data = wards |> filter(is_open) |> 
      mutate(
        x = st_coordinates(st_centroid(geometry))[,"X"],
        y = st_coordinates(st_centroid(geometry))[,"Y"]
      ),
    aes(
      x = x,
      y = y,
      label = ward
    ), 
    size = 3
) +
  theme_map_sixtysix() +
  labs(
    title="Philadelphia's Open Wards"
  )

The first thing you’ll notice is that these wards cluster around Center City. As such, they tend to be wealthier and whiter than the city as a whole; the same demographics that have seen a generalized surge in turnout since 2016. That makes the causality here difficult to measure: does the openness of a ward increase turnout, or did these wards become open thanks to local political engagement, and turnout there would have increased regardless?

There’s another complication in answering the question: how should we define turnout? I don’t love the usual measure, votes cast divided by registered voters; that number will be alarmingly low because of inactive voters still on the rolls, and the denominator moves for all sorts of reasons, including slow cleaning of the voter rolls or registration drives (I’ve called this the Turnout Funnel). And wards might have very different populations with different propensities to vote, or even be eligible to vote, which we want to control for.

I’ll use my favorite metric: turnout in an off-year election as a percentage of Presidential turnout. Off-year turnout is more malleable to local GOTV efforts: there’s more need just to remind people when Election Day is, and people get less communication about the election from other channels. A division’s Presidential turnout gives a ceiling on plausible off-year turnout. In this case, I’ll measure turnout in November 2022 divided by turnout in November 2020.

View code
df_major <- readRDS("../../data/processed_data/df_major_20230116.Rds")

turnout <- df_major |> filter(is_topline_office, ward != "99") |>
  group_by(warddiv, year, election_type) |>
  summarise(votes=sum(votes))

turnout <- turnout |> left_join(
  divs |> as.data.frame() |> select(warddiv, ward),
  by="warddiv"
) |>
  left_join(
    wards |> as.data.frame() |> select(ward, is_open, is_open_pre_22),
    by="ward"
  )

div_turnout <- turnout |>
  pivot_wider(
    names_from=c("election_type", "year"),
    values_from = "votes"
  )

ward_turnout <- turnout |> 
  group_by(ward, year, election_type, is_open, is_open_pre_22) |>
  summarise(votes=sum(votes)) |>
  pivot_wider(names_from=c("election_type", "year"), values_from="votes")

ward_turnout |>
  mutate(row = case_when(is_open ~ ward, TRUE ~ "Closed Wards")) |>
  group_by(is_open) |>
  summarise(votes_20 = sum(general_2020), votes_22 = sum(general_2022, na.rm=T)) |>
  mutate(ratio = votes_22 / votes_20)
View code
ggplot(wards |> left_join(ward_turnout, by="ward", suffix=c("", "_w"))) + 
  geom_sf(aes(fill = general_2022 / general_2020)) +
  geom_sf(aes(color=is_open), fill = NA, lwd=1) +
  scale_color_manual(guide=FALSE, values=c(`TRUE` = "white", `FALSE` = alpha("white", 0))) +
  scale_fill_viridis_c() +
  theme_map_sixtysix() %+replace%
  theme(legend.position = "right") +
  labs(
    title="Turnout in 2022 vs 2020",
    subtitle="Open Wards are outlined.",
    fill = "Votes cast \n2022 / 2020"
  )

Overall, the Open Wards (including those that only became Open later) cast 148,000 votes for President in 2020, and then 118,000 for Governor in 2022, an 80% rate. The other wards cast 594,000 and 380,000, a 64% rate. That’s a 16 percentage point difference!

But it might not be causal. We need to control for the ways Open Wards are systematically difference from Closed ones. We could do this by adding a bunch of control variables to a regression, but I prefer a stronger approach: using spatial boundary discontinuities. We can look at neighboring divisions on opposite sides of a ward boundary, one in a closed ward and one in an Open Ward) and compare the differences. By using only neighboring divisions, we create a useful apples-to-apples comparison; we assume that any systematic differences between these divisions is the effect of Open Wards. (I did a similar analysis in 2019, measuring the power of ward endorsements.)

For this analysis, I’ll limit to wards that were only Open before 2022 (sadly excluding my own 24). I filter down so each division is only in a single pair, and control for the proportions Black and Hispanic in the divisions (the pairing of divisions across boundaries should mean these proportions are not that different, but this will control for any lingering differences that do exist).

View code
block_data <- readr::read_csv(
  "../../data/census/decennial_2020_poprace_phila_blocks/DECENNIALPL2020.P4_data_with_overlays_2021-09-22T083940.csv",
  skip=1
)
block_shp <- sf::st_read(
  "../../data/gis/census/tl_2020_42101_tabblock/tl_2020_42_tabblock20.shp"
)block_shp <- st_transform(block_shp, st_crs(wards))
block_cents <- block_shp |> st_centroid()

st_contains_df <- function(shp1, shp2, id1, id2){
  raw <- st_contains(shp1, shp2)
  res <- list()
  for(i in 1:length(id1)){
    res[[i]] <- data.frame(
      id1 = id1[i],
      idx2 = raw[[i]],
      id2 = id2[raw[[i]]]
    )
  }
  return(bind_rows(res))
}

blocks_to_wards <- st_contains_df(wards, block_cents, wards$ward, block_cents$GEOID20) |>
  rename(ward = id1, block_i = idx2, geoid = id2)

res <- list()
for(ward in unique(blocks_to_wards$ward)){
  blocks_w <- block_cents[
    blocks_to_wards |> filter(ward == !!ward) |> with(block_i),
  ]
  divs_w <- divs |> filter(ward == !!ward)
  blocks_to_divs <- st_contains_df(divs_w, blocks_w, divs_w$warddiv, blocks_w$GEOID20) |>
    select(-idx2) |>
    rename(warddiv = id1,  geoid = id2)
  res[[ward]] <- blocks_to_divs
}
blocks_to_divs <- bind_rows(res)

div_pops <- blocks_to_divs |>
  inner_join(block_data |> mutate(geoid = substr(id, 10,24)), by="geoid") |>
  group_by(warddiv) |>
  summarise(
    pop = sum(`!!Total:`),
    hisp = sum(`!!Total:!!Hispanic or Latino`),
    nhw =  sum(`!!Total:!!Not Hispanic or Latino:!!Population of one race:!!White alone`),                                                                            
	nhb = sum(`!!Total:!!Not Hispanic or Latino:!!Population of one race:!!Black or African American alone`),  
    nha = sum(`!!Total:!!Not Hispanic or Latino:!!Population of one race:!!Asian alone`)
  )
neighboring_wards <- st_touches(wards)
neighboring_wards <- neighboring_wards |>
  lapply(as.data.frame) |>
  bind_rows(.id = "id0") |>
  mutate(id0 = asnum(id0)) |>
  rename(id1 = `X[[i]]`) |>
  filter(id0 < id1) |>
  mutate(ward0 = wards$ward[id0], ward1 = wards$ward[id1])

neighboring_divs <- vector(mode="list", length=nrow(neighboring_wards))
for(row in 1:nrow(neighboring_wards)){
  ward0 <- neighboring_wards$ward0[row]
  ward1 <- neighboring_wards$ward1[row]
  
  divs0 <- divs |> filter(ward == ward0)
  divs1 <- divs |> filter(ward == ward1)
  touches <- st_touches(divs0, divs1) |>
    lapply(as.data.frame) |>
    bind_rows(.id = "id0") |>
    mutate(id0 = asnum(id0)) |>
    rename(id1 = `X[[i]]`) |>
    mutate(div0 = divs0$warddiv[id0], div1 = divs1$warddiv[id1])
  neighboring_divs[[row]] <- touches |> select(div0, div1)
}
neighboring_divs <- bind_rows(neighboring_divs)

neighboring_divs <- neighboring_divs |>
  left_join(
    div_turnout, by=c("div0"="warddiv")
  ) |>
  left_join(
    div_turnout, by=c("div1"="warddiv"),
    suffix=c("_0", "_1")
  ) |>
  left_join(
    div_pops, by=c("div0" = "warddiv")
  )|>
  left_join(
    div_pops, by=c("div1" = "warddiv"),
    suffix=c("_0", "_1")
  )

neighboring_divs <- neighboring_divs |>
  mutate(
    prop_22_over_20_0 = general_2022_0 / general_2020_0,
    prop_22_over_20_1 = general_2022_1 / general_2020_1,
    signed_is_open = is_open_1 - is_open_0,
    signed_is_open_pre_22 = is_open_pre_22_1 - is_open_pre_22_0
  )

# Grab arbitrary unique pairs
set.seed(215)
neighboring_divs_lm <- neighboring_divs |>
  group_by(div0) |>
  filter(sample.int(n()) == 1) |>
  group_by(div1) |>
  filter(sample.int(n()) == 1)

form <- as.formula(
  "prop_22_over_20_1 - prop_22_over_20_0 ~ -1 + signed_is_open_pre_22 + I(hisp_1 / pop_1 - hisp_0 / pop_0) + I(nhb_1 / pop_1 - nhb_0 / pop_0)"
)
fit <- lm(form, data = neighboring_divs_lm) 
summary(fit)
## 
## Call:
## lm(formula = form, data = neighboring_divs_lm)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.31756 -0.05978 -0.00070  0.04634  0.32425 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## signed_is_open_pre_22           0.01738    0.01503   1.156    0.248    
## I(hisp_1/pop_1 - hisp_0/pop_0) -0.26663    0.06638  -4.017 7.38e-05 ***
## I(nhb_1/pop_1 - nhb_0/pop_0)   -0.24061    0.03274  -7.349 1.72e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09265 on 316 degrees of freedom
## Multiple R-squared:  0.1924, Adjusted R-squared:  0.1847 
## F-statistic:  25.1 on 3 and 316 DF,  p-value: 1.37e-14

The result: divisions in Open Wards had 1.7pp stronger turnout than divisions just across the boundary, but this result has a p-value of 0.25 and is not statistically significant (meaning we don’t have enough data to be confident it’s different from zero). This provides a confidence interval of (-1pp, 5pp).

A few notes about this analysis:

  • The result was much bigger (about 5pp) when I didn’t control for race! This means that even when comparing only division across boundaries, there are racial differences between the divisions in Open Wards and those not.
  • I’m only using one election comparison (2022 vs 2020), and we obviously have a lot more. I’ll think about ways to compellingly use more elections without introducing complications. Quick checks using 2018 vs 2016 did not substantively change the results.

So, what does this mean? Our best estimate for Open Wards’ change in turnout is a small, positive effect, but we don’t have enough data to be confident this is different from zero. Open Wards of course have other claims to benefits than just turnout: more democratic decision-making, for example. And a 2 percentage point increase in Philadelphia’s turnout, if it existed, wouldn’t be nothing: if scaled across the city, it would have meant 4,000 votes in 2019’s Superior Court election, which was decided by 17,000.

Mayoral supporters and their votes for Council At-Large

Jonathan Tannen, Sixty Six Wards
Seth Bluestein, Philadelphia City Commissioner

Today, Sixty Six Wards is partnering with Philadelphia City Commissioner Seth Bluestein to look at individual voter patterns in the 2023 Democratic Primary.

The 2023 Democratic Primary saw mayoral candidates running in distinct lanes (Sixty Six Wards’ recap, preview). Parker won handily–in retrospect–as Rhynhart and Gym came in distant second and third. In Council, all of the winners were endorsed by the Democratic City Committee, with Santamoor, McIllmurray and Almirón–in 6th, 7th, and 8th–also apparently splitting various non-machine coalitions.

In February, Sixty-Six Wards teamed up with Commissioner Bluestein to look beyond division-aggregated data at individual voter level patterns for correlations and diversity within divisions. We’re doing it again.

The Commissioner’s office has generated that dataset with anonymized, aggregated counts of candidate combinations at the Division level by vote mode–election day, mail-in, provisional–for the 2023 primary.

Rather than a gigantic piece like last time, we will provide a series of short pieces with interesting insights from the data.

How did Mayoral supporters vote for Council?

Today, we examine how each mayoral candidate’s supporters voted for council. Among Divisions, there were strong correlations between Parker and Thomas performance, or Gym and Landau. Some of that may be ecological: supporters tend to live in the same places, but might not be the same people. Just how strong is the voter-level trend?

View code
library(dplyr)
library(ggplot2)
library(tidyr)

pairs <- readr::read_csv(
  "../../data/raw_election_data/pairs_new.csv",
  locale = readr::locale(encoding = "Latin1")
)
pairs_backup <- pairs
# pairs <- pairs_backup

toplines <- readr::read_csv(
  "../../data/raw_election_data/primary_2023_combinations_toplines.csv",
  locale = readr::locale(encoding = "Latin1")
)
voters <- readr::read_csv(
  "../../data/raw_election_data/primary_2023_combinations_voters.csv"
)
histogram <- readr::read_csv(
  "../../data/raw_election_data/primary_2023_combinations_histogram.csv",
  locale = readr::locale(encoding = "Latin1")
)

toplines$office_candidate <- factor(toplines$office_candidate)

levels_df <- data.frame(
  office_candidate=levels(toplines$office_candidate)
) |>
  separate(office_candidate, into=c("office_orig", "candidate_orig"), sep="::", remove=FALSE) |>
  mutate(
    office_orig = stringr::str_trim(office_orig),
    office = stringi::stri_trans_totitle(office_orig),
    office = gsub("(Dem|Rep)$", "(\\1)", office),
    candidate_orig = stringr::str_trim(candidate_orig),
    candidate = gsub("\\s*(DEM|REP) \\([0-9]+\\)$", "", candidate_orig),
    candidate = stringi::stri_trans_totitle(stringr::str_trim(candidate)),
    candidate = gsub("\\bMc([a-z])", "Mc\\U\\1", candidate, perl=TRUE),
    candidate = gsub("\\b([Ii]+)\\b", "\\U\\1", candidate, perl=TRUE),
  )

pairs <- pairs |> filter(candidate.x != "overvote")
pairs <- pairs |> filter(candidate.y != "overvote")

pairs$office.x <- factor(pairs$office.x)
pairs$office.y <- factor(pairs$office.y)
pairs$candidate.x <- factor(pairs$candidate.x)
pairs$candidate.y <- factor(pairs$candidate.y)

levels(pairs$office.x) <- levels_df$office[match(levels(pairs$office.x), levels_df$office_orig)]
levels(pairs$office.y) <- levels_df$office[match(levels(pairs$office.y), levels_df$office_orig)]
levels(pairs$candidate.x) <- levels_df$candidate[match(levels(pairs$candidate.x), levels_df$candidate_orig)]
levels(pairs$candidate.y) <- levels_df$candidate[match(levels(pairs$candidate.y), levels_df$candidate_orig)]

testthat::expect_false(any(is.na(pairs$office.x)))
testthat::expect_false(any(is.na(pairs$office.y)))
testthat::expect_false(any(is.na(pairs$candidate.x)))
testthat::expect_false(any(is.na(pairs$candidate.y)))

histogram$office_candidate <- factor(histogram$office_candidate, levels=levels(toplines$office_candidate))
histogram$office <- levels_df$office[histogram$office_candidate]
histogram$candidate <- levels_df$candidate[histogram$office_candidate]

toplines$office <- levels_df$office[toplines$office_candidate]
toplines$candidate <- levels_df$candidate[toplines$office_candidate]

Council Results by Mayor

View code
mayor_council <- pairs |>
  filter(
    office.y == "Mayor (Dem)", 
    office.x == "Council At-Large (Dem)"
  ) |> 
  rename(mayor = candidate.y, council=candidate.x) |>
  group_by(mayor, council, Type) |>
  summarise(counts = sum(n), .groups="drop")

# Add candidate, Type pairs that are missing
mayor_council <- expand.grid(
      mayor=unique(mayor_council$mayor), 
      Type=unique(mayor_council$Type),
      council=unique(mayor_council$council)
    ) |>
  left_join(mayor_council) |>
  mutate(counts = replace_na(counts, 0)) |>
  left_join(
    toplines |> 
      filter(office == "Mayor (Dem)") |> 
      group_by(candidate, Type) |> 
      summarise(topline_votes=sum(votes), .groups="drop"), 
    by=c("mayor" = "candidate", "Type"="Type")
  ) |>
  mutate(pvote = counts / topline_votes)

get_order <- function(office){
  toplines |>
    filter(office == !!office) |>
    group_by(candidate) |>
    summarise(votes = sum(votes), .groups="drop") |>
    arrange(desc(votes)) |>
    with(candidate)
}

council_order <- get_order("Council At-Large (Dem)")
mayor_order <- get_order("Mayor (Dem)")

mayor_council <- mayor_council |>
  mutate(
    council = factor(council, levels=council_order),
    mayor = factor(mayor, levels=mayor_order)
  )

ggplot(
  mayor_council |> 
    group_by(mayor, council) |> 
    summarise(topline_votes=sum(topline_votes), counts=sum(counts)),
  aes(x=council, y=100*counts/topline_votes)
) + 
  geom_bar(stat="identity") +
  facet_wrap(~mayor) +
  theme_minimal() %+replace% 
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1, size=5)) +
  labs(
    title="Votes for Council At-Large by Mayoral Vote (Dem)",
    x=NULL,
    y="Percent of Mayoral voters who cast\nvote for Council candidate"
  ) 

Among Parker voters, the five eventual topline winners were on top. But the order was starkly different: third-place Rue Landau finished a distant fifth among this group. Conversely, Landau was the highest vote-getter among Gym voters, and among Rhynhart voters was on par with Thomas and Gilmore-Richardson. There are a few other noticeable peaks in the plot above: Santamoor and Itzkowitz received 36% and 25% of votes from Rhynhart voters, and McIllmurray and Almirón whopping 48% and 46% from Gym voters.

Amusingly, voters who wrote in choices for Mayor were vastly more likely to write in for council, too.

Those patterns didn’t change across vote mode, with Election Day voters and Mail-In voters showing similar preferences (the plots below show only the top four mayoral candidates, for sanity).

View code
ggplot(
  mayor_council |> 
    filter(mayor %in% c("Cherelle L Parker", "Rebecca Rhynhart", "Helen Gym", "Allan Domb")),
  aes(x=council, y=100*counts/topline_votes)
) + 
  geom_bar(stat="identity") +
  facet_grid(Type~mayor) +
  theme_minimal() %+replace% 
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1, size=5)) +
  labs(
    title="Votes for Council At-Large by Mayoral Vote and Vote Type",
    x=NULL,
    y="Percent of Mayoral voters who cast\nvote for Council candidate"
  ) 

Notice that the average heights of the bars are different between facets; some mayoral candidates’ supporters cast more votes for Council. The heights of the bars would sum to 500 if every voter used all five votes, but only 400 if they used on average four, 300 if three, etc.

View code
mayor_council |>
  group_by(mayor, Type, topline_votes) |>
  summarise(counts = sum(counts), .groups="drop") |>
  mutate(votes_per_voter = counts/topline_votes) |>
  ggplot(
    aes(x=mayor, y=votes_per_voter)
  ) +
  geom_bar(stat="identity") +
  facet_grid(Type~.) +
  theme_minimal() %+replace% 
  theme(axis.text.x = element_text(angle=60, vjust=1, hjust=1)) +
  labs(
    title="Council At-Large votes per voter, by Mayoral choice",
    x=NULL,
    y="Average number of votes cast\nfor Council At-Large"
  )

Among Election Day voters, Gym and Rhynhart supporters cast 3.4 and 3.3 votes for Council At-Large, respectively. Parker supporters cast 2.7, and the rest of voters about 2.5. That difference vanished among Mail-In voters: all candidates’ supporters cast between 3.6 and 4.1 At-Large votes. An obvious explanation is that mail-in voters have time to look up down-ballot candidates with the ballot in hand that Election Day voters do not.

Suppose we recreated the above plot at the Division level: categorize Divisions by which mayoral candidate won, and calculate each council candidate’s percent (we’ll use total votes for Mayor as the denominator). The patterns are reasonably similar to the person-level results.

View code
div_mayor <- toplines |> filter(office == "Mayor (Dem)") |>
  group_by(Division, candidate) |> 
  summarise(votes = sum(votes)) |>
  group_by(Division) |>
  summarise(
    winner = candidate[which.max(votes)],
    pvote = votes[which.max(votes)] / sum(votes),
    mayoral_votes = sum(votes)
  ) |>
  mutate(mayor = factor(winner, levels=mayor_order))

div_council <- toplines |> filter(office == "Council At-Large (Dem)") |>
  group_by(Division, candidate) |> 
  summarise(votes = sum(votes)) |>
  mutate(council = factor(candidate, levels=council_order))

div_council |> 
  left_join(div_mayor, by="Division") |>
  group_by(mayor, council) |>
  summarise(
    votes = sum(votes),
    mayoral_votes = sum(mayoral_votes)
  ) |>
  ggplot(
    aes(x=council, y=100*votes / mayoral_votes)
  ) +
  geom_bar(stat="identity") +
  facet_wrap(~mayor) +
  theme_minimal() %+replace% 
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1, size=5)) +
  labs(
    title="Votes for Council At-Large by Division's Mayoral Winner",
    x=NULL,
    y="Percent (Votes / Total votes for office of Mayor)"
  ) 

There are some interesting, small differences between the plots: Rhynhart Divisions supported McIllmurray over Harrity by 3.2pp, but Rhynhart voters supported Harrity over McIllmurray by 2pp. This is due to ecological correlations: Rhynhart voters live disproportionately in Divisions where the other voters were more likely to support McIllmurray (presumably Gym voters). Similarly, Santamoor beat Harrity by 3.9pp in Gym Divisions, but only by 0.6pp among Gym voters.

Regression Analysis

A form of ecological effects could permeate even the voter-level results above. Maybe Rhynhart voters came from Divisions more likely to support Santamoor, but within each Division they were just like the others. We can solve this by fitting a regression, using Division fixed effects. This allows us to say, for example, that within a given Division, Cherelle Parker supporters were on average 3.2 percentage points more likely to support Jim Harrity, and less likely to support Amanda McIllmurray and Rue Landau (8.8 and 7.9pp, respectively).

View code
mayor_council_div <- toplines |> 
    filter(office == "Mayor (Dem)") |>
    rename(voters=votes, mayor=candidate) |>
    select(Type, Division, mayor, voters) |>
    # Need this cross join in case there are zero votes for a council candidate.
    cross_join(
        data.frame(council=unique(mayor_council$council))
    ) |>
    left_join(
      pairs |> 
        filter(
          office.y == "Mayor (Dem)", 
          office.x == "Council At-Large (Dem)"
        ) |>
        rename(mayor = candidate.y, council=candidate.x, votes = n) |>
        select(mayor, council, Type, Division, votes),
      by=c("mayor", "council", "Type", "Division")
    ) |>
  mutate(votes=replace_na(votes, 0)) |>
  group_by(Division, mayor, council) |>
  summarise(
    voters=sum(voters),
    votes=sum(votes)
  )

RERUN <- FALSE

library(purrr)

if(RERUN){
  df_cand <- mayor_council_div |>
    ungroup() |>
    filter(council != "Write-In") |>
    # filter(council == !!council_candidate) |>
    cross_join(data.frame(vote_for=c(0,1))) |>
    mutate(votes = ifelse(vote_for==1, votes, voters-votes))
  fits <- list()
  
  for(council in unique(df_cand$council)){
    print(council)
    if(!council %in% names(fits)){
      fits[[council]] <- lm(
        vote_for ~ Division + mayor, 
        w=votes,
        data=df_cand |> filter(council==!!council)
      )
    }
  }  
  coefs_full <- lapply(fits, broom::tidy) |> bind_rows(.id="council")
  saveRDS(coefs_full, file="coefs_full.RDS")
} else {
  coefs_full <- readRDS("coefs_full.RDS")
}

mayor_coef <- coefs_full |> 
  filter(substr(term, 1,5) == 'mayor') |>
  mutate(mayor = gsub("^mayor", "", term))  |>
  select(council, mayor, estimate, std.error) |>
  bind_rows(
    data.frame(
      council = council_order,
      mayor = "Allan Domb", # Reference mayor
      estimate = 0,
      std.error = NA
    )
  ) |>
  left_join(
    toplines |> filter(office == "Mayor (Dem)") |>
      group_by(candidate) |>
      summarise(total_votes = sum(votes)),
    by=c("mayor"="candidate")
  ) |>
  group_by(council) |>
  mutate(weighted_coef = estimate - weighted.mean(estimate, w=total_votes))|>
  mutate(
    council = factor(council, levels=council_order),
    mayor = factor(mayor, levels=mayor_order)
  ) 


ggplot(
  mayor_coef,
  aes(x=council, y=weighted_coef)
) +
  geom_bar(stat='identity') +
  facet_wrap(~mayor) +
  theme_minimal() %+replace% 
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1, size=5)) +
  labs(
    title="Mayoral supporters' relative vote for council",
    subtitle="Coefficient of mayoral support on council votes, controlling for Division fixed effects",
    x=NULL,
    y="Coefficient of Mayoral Support on Council Vote"
  )

Within a Division, Gym supporters were vastly more likely to support Thomas, Gilmore-Richardson, Landua, McIllmurray, and Almirón, and Rhynhart supporters more likely to support Santamoor, Itzkowitz, and then Gilmore-Richardson, Landau, and Ahmad.

Parker voters were relatively similar to their broader Divisions, but less likely to support Landau, McIllmurray, and Almirón.

Supporters of all mayoral candidates from Domb on were much less likely to support the top four council candidates–Thomas, Gilmore Richardson, Landau, and Ahmad–than supporters of the top three mayoral candidates.