What’s a Ward Endorsement worth?

In 2017, about 6 points

With ballot positions decided, candidates are starting to vie for coveted ward endorsements. How many votes are they really worth?

Two years ago, I did a simplistic analysis of the Court of Common Pleas, where I found that judicial candidates received 0.9 more percent of the vote in wards where they were endorsed. In a race where candidates win with 4.3 percent of the vote, that effect is huge (and larger than even ballot position).

There were a number of caveats to that analysis: I only had endorsements from a few, systematically different wards, and I didn’t do anything to identify causality–we know that candidates do better in wards where they are endorsed, but we don’t know if the endorsements cause that increase, or if the ward leaders were endorsing candidates who would have done well there anyway.

Let’s do better.

In 2017, Max Marin at Philadelphia Weekly undertook the herculean effort of tracking down endorsements in 62 of Philadelphia’s 66 wards. Let’s use that, and do some spatial econometrics.

View code
library(tidyverse)
library(sp)
library(rgeos)
library(rgdal)
library(sf)
source("../../admin_scripts/util.R")

df_major <- safe_load("../../data/processed_data/df_major_2017_12_01.Rda")

df_major$WARD_DIVSN <- with(df_major, paste0(WARD16, DIV16))

df_major <- df_major %>%
  filter(
    election == "primary" & CANDIDATE != "Write In" & PARTY == "DEMOCRATIC"
  )
df_major <- df_major %>%
  group_by(WARD_DIVSN, OFFICE, year) %>%
  mutate(pct_vote = VOTES / sum(VOTES))

df_major <- df_major %>%
  filter(OFFICE %in% c("COUNCIL AT LARGE", "DISTRICT ATTORNEY"))


bg_17_acs <- read.csv("../../data/census/acs_2013_2017_phila_bg_race_income.csv")
bg_17_acs <- bg_17_acs %>% 
  mutate(Geo_FIPS = as.character(Geo_FIPS)) %>%
  select(
    Geo_FIPS, pop, pop_nh_white, pop_nh_black, pop_nh_asian, pop_hisp, pop_median_income_2017
  )

sp_divs <- readOGR("../../data/gis/2016/2016_Ward_Divisions.shp", verbose = FALSE)
sp_divs <- spChFIDs(sp_divs, as.character(sp_divs$WARD_DIVSN))
sp_divs <- spTransform(sp_divs, CRS("+init=EPSG:4326"))

library(tigris)
options(tigris_use_cache = TRUE)
bg_shp <- block_groups(42, 101, year = 2015)
bg_shp <- spChFIDs(bg_shp, as.character(bg_shp$GEOID))
bg_shp <- spTransform(bg_shp, CRS(proj4string(sp_divs)))

sp_divs$bg <- over(
  gCentroid(sp_divs, byid = TRUE), 
  bg_shp
)$GEOID

sp_divs@data <- sp_divs@data %>%
  left_join(bg_17_acs, by = c("bg"="Geo_FIPS"))

df_major <- df_major %>%
  left_join(sp_divs@data) %>%
  mutate(
    pct_wht = pop_nh_white / pop,
    pct_blk = pop_nh_black / pop,
    pct_asian = pop_nh_asian/ pop,
    pct_hisp = pop_hisp / pop
  )

In the 2017 DA race, no single candidate monopolized the endorsements; O’Neill led the way with 11 endorsements, largely in the Northeast.

View code
endorsements <- read_csv("da_2017_endorsements.csv")
endorsements$ward <- sprintf("%02d", endorsements$ward)

da_results <- df_major %>% 
  filter(election == "primary" & year == 2017 & OFFICE == "DISTRICT ATTORNEY") %>%
  mutate(
    last_name = gsub(
      "^.*\\s([A-Z])([A-Z]+)
View code
quot;, "\\U\\1\\L\\2", CANDIDATE, perl = TRUE ) ) %>% group_by(WARD_DIVSN) %>% mutate(total_votes = sum(VOTES)) %>% group_by() %>% mutate(pvote = VOTES / total_votes) da_results$last_name <- with( da_results, ifelse( last_name == "Neill", "O'Neill", ifelse(last_name == "Shabazz", "El-Shabazz", last_name) ) ) da_results %>% group_by(WARD16, last_name) %>% summarise(votes = sum(VOTES)) %>% group_by(WARD16) %>% mutate( ward_votes = sum(votes), pvote = votes/ ward_votes ) %>% left_join( endorsements %>% mutate(is_endorsed = TRUE), by = c("WARD16" = "ward", "last_name" = "endorsement") ) %>% mutate( is_endorsed = replace(is_endorsed, is.na(is_endorsed), FALSE) ) %>% group_by(last_name, is_endorsed) %>% summarise( pct_vote = 100 * weighted.mean(pvote, w = ward_votes), total_votes = sum(ward_votes), n_wards = n() ) %>% group_by(last_name) %>% summarise( pct_vote_overall = weighted.mean(pct_vote, w = total_votes), wards_endorsed = ifelse(any(is_endorsed), n_wards[is_endorsed], 0), turnout_endorsed = ifelse(any(is_endorsed), total_votes[is_endorsed], 0), pct_vote_notendorsed = pct_vote[!is_endorsed], pct_vote_endorsed = ifelse(any(is_endorsed), pct_vote[is_endorsed], NA) ) %>% arrange(desc(pct_vote_overall)) %>% knitr::kable( digits = 0, format = "html", format.args = list(big.mark = ","), col.names = c( "Candidate", "Citywide % of vote", "Number of ward endorsements", "Turnout in endorsed wards", "% of vote in un-endorsed wards", "% of vote in endorsed wards" ) ) 

 

Candidate Citywide % of vote Number of ward endorsements Turnout in endorsed wards % of vote in un-endorsed wards % of vote in endorsed wards
Krasner 38 9 28,700 36 46
Khan 20 8 23,485 18 32
Negrin 14 10 20,794 13 21
El-Shabazz 12 7 19,504 11 17
Untermeyer 8 9 16,200 7 17
O’Neill 6 11 21,520 4 20
Deni 2 0 0 2 NA

Krasner won by over 5,000 votes (18%), despite receiving the typical number of ward endorsements. The endorsements that he did receive came from wards with the highest turnout, but part of that is reverse causality: the places that he energized turned out big.

Naively, candidates did about 11 percentage points better in wards where they were endorsed than in wards where they weren’t. BUT. This suffers from the same lack of causal identification as the Judicial Analysis above: we don’t know if they did better because of the endorsements, or if they were just endorsed in wards where they would have done well anyway.

How can we do better? Let’s use something I noticed in last week’s post on District 7: the strength of boundaries.

The strongest ward endorsements can have visible effects in divisions just across the street from each other.

View code
wards <- readOGR("../../data/gis/2016","2016_Wards", verbose=FALSE) %>%
  spTransform(CRS(proj4string(sp_divs))) 

ggwards <- fortify(spChFIDs(wards, sprintf("%02d", wards$WARD)))

bbox <- sp_divs[substr(sp_divs$WARD_DIVSN,1,2) %in% c("10", "50", "09"),] %>% 
  gUnionCascaded() %>% 
  bbox()


bbox <- rowMeans(bbox) + 1.2 * sweep(bbox, 1, rowMeans(bbox))

polygon_in_bbox <- function(p) {
  coords <- p@Polygons[[1]]@coords
  any(
    coords[,1] > bbox[1,1] &
      coords[,1] < bbox[1,2] &
      coords[,2] > bbox[2,1] &
      coords[,2] < bbox[2,2] 
  )
}
sp_divs$in_bbox <- sapply(sp_divs@polygons, polygon_in_bbox) 

ggdivs <- fortify(spChFIDs(sp_divs, as.character(sp_divs$WARD_DIVSN)))

ggdivs <- ggdivs %>%
  left_join(
    sp_divs@data %>% select(WARD_DIVSN, in_bbox),
    by = c("id" = "WARD_DIVSN")
  ) %>%
  left_join(
    da_results %>% filter(last_name %in% c("Khan", "Krasner", "El-Shabazz")),
    by = c("id" = "WARD_DIVSN")
  )

ward_centroids <- gCentroid(wards, byid=TRUE) %>% as.data.frame()
ward_centroids$ward <- wards$WARD

ggplot(
  ggdivs %>% filter(in_bbox),
  aes(x=long, y=lat)
) +
  geom_polygon(aes(fill = 100 * pvote, group=group), color = NA) +
  geom_polygon(data = ggwards, aes(group=group), fill = NA, color = "white") +
  geom_text(data = ward_centroids, aes(x=x, y=y, label=ward), color = "white") +
  facet_wrap(~last_name) +
  scale_fill_viridis_c("% of vote") +
  theme_map_sixtysix() +
  coord_map(xlim=bbox[1,], ylim=bbox[2,]) +
  theme(
    legend.position = "bottom",
    legend.direction = "horizontal"
  ) +
  ggtitle("Percent of the Vote in Northwest Wards", "2017 DA Race")

plot of chunk map
Wards 10 and 50 endorsed Krasner, Ward 9 endorsed Khan, and Ward 22 endorsed El-Shabazz. You can immediately see the strength of 10 and 50’s endorsements: Krasner did better in divisions inside the boundary of 10 and 50 than he did just across the street. Same for 9, maybe, where Khan did well. And El-Shabazz did better in 22, though there isn’t an obvious boundary effect.

I’ll use this intuition to measure the effect across all boundaries in the whole city. To isolate the causal effect of the wards, I’ll limit the analysis to only compare divisions that are across the street from each other but happen to be divided by a ward boundary, and where different candidates were endorsed. This will ensure that we’re comparing divisions apples-to-apples, where the only thing that’s different is the ward endorsement.

I’ll go one step farther, and control for the census demographics of the block groups that the division sits in, in case a ward boundary happens to also serve as an emergent boundary (dissertation plug). I measure how each candidate’s vote correlated with the race and ethnicity of the neighborhood and subtract that out, leaving a measure of how much better or worse that candidate did than expected. It’s that “residual” that I will compare across boundaries.

View code
da_fit <- lm(
  pvote ~
    CANDIDATE * pct_wht + 
    CANDIDATE * pct_blk + 
    CANDIDATE * pct_hisp,
    # CANDIDATE * log(pop_median_income_2017),
  data = da_results
)

da_results$predicted <- predict(da_fit, newdata = da_results)
da_results$resid <- with(da_results, pvote - predicted)

neighbors <- st_intersection(st_as_sf(sp_divs), st_as_sf(sp_divs))
neighbors <- neighbors %>%
  filter(WARD != WARD.1)

neighbors <- neighbors %>%
  mutate(geometry_type = st_geometry_type(geometry)) %>%
  filter(!geometry_type %in% c("POINT", "MULTIPOINT"))


neighbors <- neighbors %>% 
  mutate(
    WARD.0 = sprintf("%02d", asnum(WARD)),
    WARD.1 = sprintf("%02d", asnum(WARD.1))
  ) %>%
  left_join(
    endorsements %>% rename(endorsement.0 = endorsement),
    by = c("WARD.0" = "ward")
  ) %>%
  left_join(
    endorsements %>% rename(endorsement.1 = endorsement),
    by = c("WARD.1" = "ward")
  ) 

neighbors <- neighbors %>% 
  left_join(
    da_results %>% 
      select(WARD_DIVSN, last_name, total_votes, pvote, resid) %>%
      rename(total_votes.0 = total_votes, pvote.0 = pvote, resid.0 = resid),
    by = c("WARD_DIVSN" = "WARD_DIVSN", "endorsement.0" = "last_name")
  ) %>% 
  left_join(
    da_results %>% 
      select(WARD_DIVSN, last_name, total_votes, pvote, resid) %>%
      rename(total_votes.1 = total_votes, pvote.1 = pvote, resid.1 = resid),
    by = c("WARD_DIVSN.1" = "WARD_DIVSN", "endorsement.0" = "last_name")
  )

To correctly measure wards’ individual strength, I fit a random effects model, which simultaneously estimates the average effect of all wards’ endorsements and how much each individual ward varies from that.

View code
library(lme4)

df0 <- neighbors %>% filter(endorsement.0 != endorsement.1)

fit_lmer <- function(neighbor_df){
  re_fit <- lmer(
    resid.0 - resid.1 ~ (1 | WARD.0),
    data = neighbor_df,
    weights = neighbor_df %>%
      with(pmin(total_votes.0, total_votes.1))
  )

  re <- ranef(re_fit)$WARD.0 
  re <- re %>%
    mutate(
      ward = row.names(re),
      effect = re_fit@beta + `(Intercept)`
    )

  return(
    list(
      fit = re_fit,
      re = re
    )
  )
}

fit_baseline <- fit_lmer(df0)

n_boot <- 200
bs_list <- vector(mode = "list", length = n_boot)
for(b in 1:n_boot){

  sample_divs = sample(unique(df0$WARD_DIVSN), replace = TRUE)

  #if(b %% floor(n_boot / 10) == 0) print(b)
  df_samp <- data.frame(WARD_DIVSN = sample_divs) %>% left_join(df0)
  bs_fit <- fit_lmer(df_samp)
  bs_list[[b]] <- bs_fit
}

fixef_ci <- quantile(
  sapply(bs_list, function(x) fixef(x$fit)),
  c(0.025, 0.975)
)
cat(paste0(
  "Average Effect of a Ward Endorsement:\n",
  sprintf(
    "%0.1f (%0.1f, %0.1f)",
    fixef(fit_baseline$fit)["(Intercept)"] * 100,
    fixef_ci[1] * 100,
    fixef_ci[2] * 100
  )
))
## Average Effect of a Ward Endorsement:
## 5.8 (5.0, 6.9)

The average Ward endorsement was worth 5.8 percentage points in the 2017 DA race. This is about half of the 11 percentage point gap we saw in the naive analysis above; it turns out the other half was because of wards endorsing candidates that the voters already supported.

But some wards are much more important than others.

How does each ward’s endorsement stack up? The table below sorts the wards by order of the vote effect, which is the percentage effect of the endorsement times the 2017 primary turnout.

View code
ranef_ci <- bind_rows(
  lapply(bs_list, function(x) x$re), 
  .id = "sim"
) %>%
  group_by(ward) %>%
  summarise(
    p025 = quantile(effect, 0.025),
    p975 = quantile(effect, 0.975)
  )

fit_baseline$re %>% 
  select(ward, effect) %>% 
  left_join(ranef_ci) %>%
  mutate(
    ci = sprintf("(%0.1f, %0.1f)", 100 * p025, 100*p975)
  ) %>%
  left_join(
    da_results %>% 
      group_by(WARD16, last_name) %>%
      summarise(
        pvote = 100 * weighted.mean(pvote, w = total_votes),
        total_votes = sum(total_votes)
      ) %>%
      inner_join(
        endorsements, 
        by = c("WARD16" = "ward", "last_name" = "endorsement")
      ),
    by = c("ward" = "WARD16")
  ) %>%
  mutate(
    pvote = round(pvote, 0),
    effect = round(100 * effect, 0),
    vote_effect = round(effect/100 * total_votes)
  ) %>%
  rename(endorsement = last_name) %>%
  select(ward, endorsement, pvote, effect, ci, total_votes, vote_effect) %>%
  arrange(desc(vote_effect)) %>% 
  DT::datatable(
    rownames=FALSE,
    colnames = c("Ward", "Endorsee", "% of Vote in Ward", "Endorsement Effect at Boundary","CI", "Ward Votes", "Vote Effect of Endorsement")
  )
Ward Endorsee % of Vote in Ward Endorsement Effect at Boundary CI Ward Votes Vote Effect of Endorsement
10 Krasner 50 14 (8.0, 19.5) 3,719 521
09 Khan 37 12 (6.7, 17.5) 4,264 512
30 Khan 39 14 (8.5, 17.9) 3,403 476
52 Negrin 17 10 (1.1, 20.0) 3,768 377
36 Negrin 18 9 (4.8, 13.6) 3,932 354
56 Untermeyer 26 14 (9.9, 19.5) 2,346 328
50 Krasner 56 6 (-0.7, 15.6) 5,094 306
61 El-Shabazz 24 12 (4.7, 19.3) 2,547 306
40 O’Neill 15 8 (4.8, 14.3) 3,591 287
42 Krasner 45 20 (14.6, 25.2) 1,270 254
38 Negrin 29 10 (3.3, 21.0) 2,507 251
01 O’Neill 13 7 (3.7, 9.1) 2,954 207
03 Untermeyer 22 8 (3.0, 14.3) 2,312 185
63 O’Neill 24 8 (2.4, 16.0) 1,920 154
19 Negrin 50 26 (16.4, 40.6) 589 153
57 O’Neill 26 8 (4.5, 13.0) 1,719 138
65 O’Neill 26 8 (5.9, 11.0) 1,644 132
23 O’Neill 20 10 (5.2, 18.7) 1,284 128
05 Khan 30 2 (-2.5, 7.2) 5,927 119
60 Negrin 15 5 (2.0, 8.2) 2,350 118
07 Negrin 55 21 (6.2, 36.9) 548 115
21 Khan 34 2 (-3.3, 8.7) 5,383 108
31 Khan 23 5 (0.4, 9.0) 2,076 104
24 Untermeyer 11 7 (4.4, 11.0) 1,437 101
51 Untermeyer 13 4 (1.8, 9.8) 2,386 95
46 El-Shabazz 9 2 (-0.9, 5.3) 4,515 90
16 Untermeyer 21 9 (1.7, 13.0) 965 87
12 Krasner 42 3 (-0.7, 7.8) 2,627 79
27 Krasner 70 4 (-3.2, 11.4) 1,978 79
41 Khan 24 7 (3.9, 11.0) 975 68
48 Untermeyer 13 4 (1.8, 7.8) 1,623 65
64 O’Neill 22 7 (-0.8, 13.8) 795 56
58 Untermeyer 15 2 (-6.2, 7.2) 2,606 52
14 Negrin 12 5 (2.8, 7.9) 986 49
34 Krasner 34 1 (-3.6, 5.2) 4,900 49
06 Krasner 47 3 (-10.0, 10.1) 1,605 48
39 O’Neill 25 1 (-1.3, 5.9) 4,462 45
43 Negrin 23 4 (-0.1, 8.4) 1,091 44
25 Untermeyer 14 5 (-1.5, 11.7) 801 40
55 O’Neill 18 3 (-2.8, 7.9) 1,305 39
54 O’Neill 11 4 (0.8, 6.5) 720 29
62 O’Neill 23 2 (-1.8, 5.8) 1,126 23
45 Khan 22 2 (-1.6, 5.5) 891 18
32 El-Shabazz 27 1 (-3.3, 6.7) 1,727 17
33 Khan 26 3 (-1.4, 9.3) 566 17
35 Untermeyer 11 1 (-1.9, 5.6) 1,724 17
04 El-Shabazz 31 0 (-3.1, 4.5) 2,110 0
44 Krasner 37 0 (-5.4, 7.7) 1,416 0
47 El-Shabazz 15 -1 (-5.0, 5.9) 749 -7
29 Negrin 23 -2 (-5.5, 4.2) 1,331 -27
15 Negrin 18 -1 (-5.1, 3.4) 3,692 -37
49 El-Shabazz 19 -3 (-7.1, 0.3) 2,348 -70
22 El-Shabazz 13 -2 (-4.5, 1.3) 5,508 -110
08 Krasner 42 -3 (-6.8, 0.6) 6,091 -183

The most important ward in 2017 was Ward 10, which gave Krasner a 14 percentage point boost on a turnout of 3,719, meaning an estimated bump of 507 votes. (The exact order of the rankings has a lot of uncertainty. Don’t take them as gospel.) Those three Northwest wards we looked at above, 10, 9 and 50, were all in the top seven, with 10 and 9 making up first and second place, largely on the back of their high turnout.

What this means for May

This analysis is specific to the 2017 DA race in a number of ways. I expect ward endorsements to have more importance in low-information races, and all of the races this time around–City Council At Large, Judicial, and Commissioner–will be lower-information than the 2017 DA.

Consider the simplistic analysis I did for the 2017 Court of Common Please. That analysis found that endorsed candidates performed 0.9 percentage points better, in a race that took 4.3% of the vote to win. That estimate is the analog to the 11 point DA effect in the first table. We found that half of the 11 DA points was actually causal, so 0.45 points is a naive guess of the effect in judicial races.

But there are two more changes. First, taking half of the effect is almost certainly too conservative for judges. There are few pre-existing preferences among voters, so much less of that correlation will be “wards endorsing candidates that are already popular”. The causal part will be higher.

But second, the wards that I had data for in that analysis are all the wards with the strongest endorsement effect in this one: 9, 30, 52, and 50 were all among the 18 wards I had data for. So that estimate might be higher, too, than if we had data for every ward.

We end up in between. The Ward endorsements–especially in the top wards on the chart–are effective but not decisive. They are powerful enough that they likely decide close judicial races, but not enough to have changed 2017’s DA race.

Appendix: Ward Map

View code
ggplot(ggwards, aes(x = long, y=lat)) +
  geom_polygon(aes(group=group), fill = strong_green, color = "white") +
  geom_text(data = ward_centroids, aes(x=x, y=y, label=ward), color = "white") +
  theme_map_sixtysix() +
  coord_map() +
  ggtitle("Philadelphia's Wards")

plot of chunk ward_map

The neighborhoods (or wards) that decide District 7

Could Maria Lose?

Maria Quiñones-Sánchez is facing a significant challenge for the second election in a row. The three term councilmember was challenged by Manny Morales four years ago, and eked out a 53.5% – 46.5% win, a margin of only 868 votes. She’s being challenged again this year by state Rep and ward leader Angel Cruz. What should we expect?

The 7th district is decidedly different from Jannie Blackwell’s 3rd or Kenyatta Johnson’s 2nd. Those heavily-segregated districts had clear racial coalitions that swung differently from year to year.

North Philly’s 7th is segregated, but more homogenously so. It’s Philadelphia’s most Hispanic district, with a predominantly Black section of Frankford in the Northeast and the beginnings of White Kensington gentrification expanding up in the South. In the manner of Philadelphia’s Hispanic neighborhoods, it has very low turnout.

But the vote this May won’t split along racial lines. Instead, what matters is the Wards.

Four years ago, three of the district’s ward leaders coordinated to support Quiñones-Sánchez’s challenger, with heavy involvement from Johnny Doc’s Local 98, including more than $100,000 in contributions and over 1,000 “voter assist requests” for help in the voting booth.

The leaders are coordinating a challenge again, with a candidate Cruz who has apparently less baggage than Morales.

(I can’t quite do justice to the story of this race. Check out the links above and Billy Penn’s summary. )

The challenge seems stronger this year, with eight of the twelve ward leaders in the district supporting Cruz in the party endorsement. But Quiñones-Sánchez held it off last year. What should we expect?

District 7’s voting blocks

The 7th Council district covers North Philly, with 6th and 9th Streets serving as the Western border.

View code
library(tidyverse)
library(rgdal)
library(rgeos)
library(sp)
library(ggmap)

sp_council <- readOGR("../../../data/gis/city_council/Council_Districts_2016.shp", verbose = FALSE)
sp_council <- spChFIDs(sp_council, as.character(sp_council$DISTRICT))

sp_divs <- readOGR("../../../data/gis/2016/2016_Ward_Divisions.shp", verbose = FALSE)
sp_divs <- spChFIDs(sp_divs, as.character(sp_divs$WARD_DIVSN))
sp_divs <- spTransform(sp_divs, CRS(proj4string(sp_council)))

load("../../../data/processed_data/df_major_2017_12_01.Rda")

ggcouncil <- fortify(sp_council) %>% mutate(council_district = id)
ggdivs <- fortify(sp_divs) %>% mutate(WARD_DIVSN = id)
View code
## Need to add District result election from 2015
raw_d2 <-  read.csv("../../../data/raw_election_data/2015_primary.csv") 
raw_d2 <- raw_d2 %>% 
  filter(OFFICE == "DISTRICT COUNCIL-7TH DISTRICT-DEM") %>%
  mutate(
    WARD = sprintf("%02d", asnum(WARD)),
    DIV = sprintf("%02d", asnum(DIVISION))
  )

load('../../../data/gis_crosswalks/div_crosswalk_2013_to_2016.Rda')
crosswalk_to_16 <- crosswalk_to_16 %>% group_by() %>%
  mutate(
    WARD = sprintf("%02s", as.character(WARD)),
    DIV = sprintf("%02s", as.character(DIV))
  )

d2 <- raw_d2 %>% 
  left_join(crosswalk_to_16) %>%
  group_by(WARD16, DIV16, OFFICE, CANDIDATE) %>%
  summarise(VOTES = sum(VOTES * weight_to_16)) %>%
  mutate(PARTY="DEMOCRATIC", year="2015", election="primary")
df_major <- bind_rows(df_major, d2)
View code
races <- tribble(
  ~year, ~OFFICE, ~office_name,
  "2015", "MAYOR", "Mayor",
  "2015", "DISTRICT COUNCIL-7TH DISTRICT-DEM", "Council 7th District",
  "2016", "PRESIDENT OF THE UNITED STATES", "President",
  "2017", "DISTRICT ATTORNEY", "District Attorney"
) %>% mutate(election_name = paste(year, office_name))

candidate_votes <- df_major %>% 
  filter(election == "primary" & PARTY == "DEMOCRATIC") %>%
  inner_join(races %>% select(year, OFFICE)) %>%
  mutate(WARD_DIVSN = paste0(WARD16, DIV16)) %>%
  group_by(WARD_DIVSN, OFFICE, year, election) %>%
  mutate(
    total_votes = sum(VOTES),
    pvote = VOTES / sum(VOTES)
  ) %>% 
  group_by()
  
turnout_df <- candidate_votes %>%
  filter(!grepl("COUNCIL", OFFICE)) %>% 
  group_by(WARD_DIVSN, OFFICE, year, election) %>%
  summarise(total_votes = sum(VOTES)) %>%
  left_join(
    sp_divs@data %>% select(WARD_DIVSN, AREA_SFT)
  )

turnout_df$AREA_SFT <- asnum(turnout_df$AREA_SFT)
View code
get_labpt_df <- function(sp){
  mat <- sapply(sp@polygons, slot, "labpt")
  df <- data.frame(x = mat[1,], y=mat[2,])
  return(
    cbind(sp@data, df)
  )
}

ggplot(ggcouncil, aes(x=long, y=lat)) +
  geom_polygon(
    aes(group=group),
    fill = strong_green, color = "white", size = 1
  ) +
  geom_text(
    data = get_labpt_df(sp_council),
    aes(x=x,y=y,label=DISTRICT)
  ) +
  theme_map_sixtysix() +
  coord_map() +
  ggtitle("Council Districts")

plot of chunk council_map

View code
DISTRICT <- "7"
sp_district <- sp_council[row.names(sp_council) == DISTRICT,]

bbox <- sp_district@bbox
## expand the bbox 20%for mapping
bbox <- rowMeans(bbox) + 1.2 * sweep(bbox, 1, rowMeans(bbox))

if(file.exists("map_cache.Rda")){
  load("map_cache.Rda")
} else {
    basemap <- get_map(bbox, maptype="toner-lite", filename="map_cache.png")
    save(basemap, file="map_cache.Rda")
}

district_map <- ggmap(
  basemap, 
  extent="normal", 
  base_layer=ggplot(ggcouncil, aes(x=long, y=lat, group=group)),
  maprange = FALSE
) 
## without basemap:
# district_map <- ggplot(ggcouncil, aes(x=long, y=lat, group=group))

district_map <- district_map +
  theme_map_sixtysix() +
  coord_map(xlim=bbox[1,], ylim=bbox[2,])


sp_divs$council_district <- over(
  gCentroid(sp_divs, byid = TRUE), 
  sp_council
)$DISTRICT

polygon_in_bbox <- function(p) {
  coords <- p@Polygons[[1]]@coords
  any(
    coords[,1] > bbox[1,1] &
      coords[,1] < bbox[1,2] &
      coords[,2] > bbox[2,1] &
      coords[,2] < bbox[2,2] 
  )
}

sp_divs$in_bbox <- sapply(
  sp_divs@polygons,
  polygon_in_bbox
)

ggdivs <- ggdivs %>% 
  left_join(
    sp_divs@data %>% select(WARD_DIVSN, in_bbox)
  )

district_map +
  geom_polygon(
    aes(alpha = (id == DISTRICT)),
    fill="black",
    color = "grey50",
    size=2
  ) +
  scale_alpha_manual(values = c(`TRUE` = 0.2, `FALSE` = 0), guide = FALSE) +
  ggtitle(sprintf("Council District %s", DISTRICT))

plot of chunk district_map The district has the lowest turnout in the city. Philadelphia’s Hispanic neighborhoods have very low turnout, and this district is the most Hispanic. Curiously, not only does the neighborhood have low turnout in Presidential elections, but it then has disproportionately lower turnout in municipal elections even given that: even residents who vote for President are less likely to vote in other years.

View code
# hist(turnout_df$total_votes / turnout_df$AREA_SFT, breaks = 1000)

turnout_df <- turnout_df %>%
  left_join(races)

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(turnout_df, by =c("id" = "WARD_DIVSN")),
    aes(fill = pmin(total_votes / AREA_SFT, 0.0005) * 5280^2)
  ) +
  scale_fill_viridis_c(
    "Votes per mile", 
    labels=scales::comma, 
    guide=guide_colorbar(label.theme=element_text(angle=90, size = 10), label.hjust=1)
  ) +
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  facet_wrap(~ election_name) +
  expand_limits(fill=0) +
  ggtitle(
    "Votes per mile in the Democratic Primary", 
    sprintf("Council District %s", DISTRICT)
  ) +
  theme(legend.position = "bottom", legend.direction = "horizontal")

plot of chunk turnout_map The district as a whole cast 25,000 votes in the 2016 Presidential primary, 12,000 last time Quninones-Sanchez ran, and only 6,000 in the the 2017 District Attorney race. They did not see the Krasner surge.

Demographically, the district is Philadelphia’s most heavily hispanic, though with Frankford in its Northeast being predominantly Black:

View code
bg_17_acs <- read.csv("../../../data/census/acs_2013_2017_phila_bg_race_income.csv")
bg_17_acs <- bg_17_acs %>% 
  mutate(Geo_FIPS = as.character(Geo_FIPS)) %>%
  select(Geo_FIPS, pop, pop_nh_white, pop_nh_black, pop_nh_asian, pop_hisp, pop_median_income_2017)

library(tigris)
options(tigris_use_cache = TRUE)
bg_shp <- block_groups(42, 101, year = 2015)
bg_shp <- spChFIDs(bg_shp, as.character(bg_shp$GEOID))
bg_shp <- spTransform(bg_shp, CRS(proj4string(sp_divs)))

bg_shp$in_bbox <- sapply(
  bg_shp@polygons,
  polygon_in_bbox
)

gg_bgs <- fortify(bg_shp)
gg_bgs <- gg_bgs %>%
  left_join(bg_shp@data[,c("GEOID", "ALAND", "in_bbox")], by = c("id" = "GEOID")) %>%
  left_join(bg_17_acs, by = c("id" = "Geo_FIPS"))

district_map +
  geom_polygon(
    data = gg_bgs %>%
      filter(in_bbox) %>%
      gather("key", "race_pop",pop_nh_white, pop_nh_black, pop_nh_asian, pop_hisp) %>%
      mutate(
        pct_pop = 100 * race_pop / pop,
        race = c(
          pop_hisp = "Hispanic", pop_nh_white="NH White", pop_nh_black="Black", pop_nh_asian="Asian"
        )[key]
      ),
    aes(fill = pct_pop)
  ) + 
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  facet_wrap(~race) +
  scale_fill_viridis_c("Percent of\n Population") +
  theme(legend.position = "right") +
  ggtitle(sprintf("Race and Ethnicity of District %s", DISTRICT))

plot of chunk census

The district is less obviously politically split than other parts of the city. When it does split, it often does so for Latino candidates. Below are the results from the last race for the 7th and three other recent, compelling Democratic Primary races: 2015 Mayor, 2016 President, and 2017 District Attorney. The maps below show the vote for the top two candidates in District 2 (except for City Council in 2015, where I use Helen Gym and Isaiah Thomas, who were 4th and 5th in the district, and 5th and 6th citywide.)

View code
candidate_votes <- candidate_votes %>%
  left_join(sp_divs@data %>% select(WARD_DIVSN, council_district))

## Choose the top two candidates in district 3
## Except for city council, where we choose Gym and Thomas
# candidate_votes %>%
#   group_by(OFFICE, year, CANDIDATE) %>%
#   summarise(
#     city_votes = sum(VOTES),
#     district_votes = sum(VOTES * (council_district == DISTRICT))
#   ) %>%
#   arrange(desc(city_votes)) %>%
#   filter(OFFICE == "MAYOR")

candidates_to_compare <- tribble(
  ~year, ~OFFICE, ~CANDIDATE, ~candidate_name, ~row,
  "2015", "DISTRICT COUNCIL-7TH DISTRICT-DEM", "MANNY MORALES", "Manny Morales", 1,
  "2015", "DISTRICT COUNCIL-7TH DISTRICT-DEM", "MARIA QUINONES SANCHEZ", "Maria Quiñones-Sánchez", 2,
  "2015", "MAYOR", "JIM KENNEY", "Jim Kenney",  2,
  "2015", "MAYOR", "NELSON DIAZ", "Nelson Diaz", 1,
  "2016", "PRESIDENT OF THE UNITED STATES", "BERNIE SANDERS", "Bernie Sanders", 2,
  "2016", "PRESIDENT OF THE UNITED STATES", "HILLARY CLINTON", "Hillary Clinton", 1,
  "2017", "DISTRICT ATTORNEY", "LAWRENCE S KRASNER", "Larry Krasner", 2,
  "2017", "DISTRICT ATTORNEY", "RICH NEGRIN","Rich Negrin", 1
)

candidate_votes <- candidate_votes %>%
  left_join(races) %>%
  left_join(candidates_to_compare)

vote_adjustment <- function(pct_vote, office){
  ifelse(office == "COUNCIL AT LARGE", pct_vote * 4, pct_vote)
}

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(
        candidate_votes %>% filter(!is.na(row))
      ),
    aes(fill = 100 * vote_adjustment(pvote, OFFICE))
  ) +
  scale_fill_viridis_c("Percent of Vote") +
  theme(
    legend.position =  "bottom",
    legend.direction = "horizontal",
    legend.justification = "center"
  ) +
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  geom_label(
    data=candidates_to_compare %>% left_join(races),
    aes(label = candidate_name),
    group=NA,
    hjust=0, vjust=1,
    x=bbox[1,1],
    y=bbox[2,2]
  ) +
  facet_grid(row ~ election_name) +
  theme(strip.text.y = element_blank()) +
  ggtitle(
    sprintf("Candidate performance in District %s", DISTRICT) 
    # "Percent of vote (times 4 for Council, times 1 for other offices)"
  )

plot of chunk proportion The district split for Mayor and for District Attorney along racial lines: the densest latino neighborhoods voted for Diaz and Negrin, while the rest of the district voted for the citywide winners Kenney and Krasner. However, the cleavage was geographically different for Quiñones-Sánchez v Morales, presumably because both candidates were latinx. That’s true this time around, too.

Just because there wasn’t a racial split doesn’t mean the district voted uniformly. There indeed were stark variations in how Quiñones-Sánchez did, but the reason isn’t obviously clear. She did well in the Northwest of the district, especially West of 6th St, and in the Northeast of the district except for South of the Boolevard and East of Oxford.

What gives? Those are all Ward boundaries.

View code
wards <- readOGR("../../../data/gis/2016","2016_Wards", verbose=FALSE) %>%
  spTransform(CRS(proj4string(sp_divs)))

ggwards <- fortify(wards)

wards_centers <- sapply(wards@polygons, slot, "labpt") %>% t
wards_centers <- as.data.frame(wards_centers)
names(wards_centers) <- c("x", "y")

wards@data <- cbind(wards@data, wards_centers)


district_map + 
  geom_polygon(data = ggwards, fill = NA, color = strong_red, size= 2) +
  geom_polygon(
    data = ggcouncil %>% filter(council_district == 7), 
    fill=NA, 
    color = "black", 
    size = 2
  ) +
  geom_text(
    data = wards@data,
    aes(x=x, y=y, label=WARD),
    group = NA,
    color = strong_red,
    fontface="bold"
  ) +
  ggtitle("The wards of District 7")

plot of chunk wards

Quiñones-Sánchez’s performance is a potent example of the power of Ward endorsements: she performed very differently in different wards, in some cases on literally the other side of the street. That thin sliver in the Northwest of the District where she did exceptionally well was the 43rd Ward. She received large percentages in the 23rd and 54th in the Northwest, too, but that region where she did poorly, South of the Boolevard and East of Oxford, exactly lines up with the boundaries to the 62nd.

So the question for the 2019 race becomes how the ward endorsements will shake out and how powerful each ward is, both in terms of ability to swing the vote and typicaly turnout. Below are measures of their strength. I’ve also pulled in the Ward leaders’ endorsements from the DCC vote.

View code
## Get vote-weighted populations
div_centroids <- gCentroid(sp_divs[sp_divs$council_district == DISTRICT,], byid=TRUE)
div_centroids$WARD_DIVSN <- attr(div_centroids@coords, "dimnames")[[1]]
div_centroids$bg_GEOID <- over(div_centroids, bg_shp)$GEOID
div_centroids@data <- left_join(div_centroids@data, bg_17_acs, by = c("bg_GEOID" = "Geo_FIPS"))

district_7_results <- df_major %>%
  filter(
    year == 2015 & grepl("7TH DISTRICT", OFFICE) & CANDIDATE != "Write In"
  ) %>%
  mutate(WARD_DIVSN = paste0(WARD16, DIV16)) %>%
  select(WARD_DIVSN, CANDIDATE, VOTES) %>%
  spread(CANDIDATE, VOTES) %>%
  mutate(
    total_votes = (`MARIA QUINONES SANCHEZ` + `MANNY MORALES`),
    p_quinones_sanchez = `MARIA QUINONES SANCHEZ` / total_votes
  )

div_centroids@data <- left_join(div_centroids@data, district_7_results)

ward_pops <- div_centroids@data %>%
  mutate(ward = substr(WARD_DIVSN, 1, 2)) %>%
  group_by(ward) %>%
  summarise(
    p_quinones_sanchez = 100 * weighted.mean(p_quinones_sanchez, w = total_votes),
    pct_nh_white = 100 * weighted.mean(pop_nh_white / pop, w = total_votes),
    pct_nh_black = 100 * weighted.mean(pop_nh_black / pop, w = total_votes),
    pct_nh_asian = 100 * weighted.mean(pop_nh_asian / pop, w = total_votes),
    pct_hisp = 100 * weighted.mean(pop_hisp / pop, w = total_votes),
    council_votes = sum(total_votes)
  )

ward_results <- df_major %>%
  mutate(WARD_DIVSN = paste0(WARD16, DIV16)) %>%
  inner_join(
    tribble(
      ~year, ~election, ~PARTY, ~OFFICE,
      "2016", "primary", "DEMOCRATIC", "PRESIDENT OF THE UNITED STATES",
      "2017", "primary", "DEMOCRATIC", "DISTRICT ATTORNEY"
    )
  ) %>%
  inner_join(div_centroids@data) %>%
  group_by(year, WARD16) %>%
  summarise(total_votes = sum(VOTES)) %>%
  group_by() %>%
  group_by(year) %>%
  mutate(pct_of_year = 100 * total_votes / sum(total_votes)) %>%
  gather("key", "value", total_votes, pct_of_year) %>%
  unite("key", key, year) %>%
  spread(key, value)
    
ward_pops <- ward_pops %>% 
  left_join(
    ward_results %>% rename(ward = WARD16)
  ) %>%
  group_by() %>%
  mutate(pct_of_year_2015 = 100 * council_votes / sum(council_votes))

ward_leaders <- tribble(
  ~ward, ~leader, ~endorsement,
  "07", "Angel Cruz", "Cruz",
  "18", "Theresa Alicea", "Cruz",
  "19", "Carlos Matos","Cruz",
  "23", "Timothy Savage","Quiñones-Sánchez",
  "25", "Thomas Johnson","Cruz",
  "31", "Margaret Rzepski","Cruz",
  "33", "Donna Aument","Cruz",
  "42", "Sharon Vaughn","Quiñones-Sánchez",
  "43", "Emilio Vazquez","Quiñones-Sánchez",
  "49", "Shirley Gregory","Cruz",
  "54", "Alan Butkovitz","Quiñones-Sánchez",
  "62", "Margaret Tartaglione","Cruz"
)

ward_pops <- ward_pops %>% left_join(ward_leaders)

knitr::kable(
  ward_pops %>%
    select(ward, council_votes, pct_of_year_2015, p_quinones_sanchez, leader, endorsement, pct_hisp, pct_nh_white, pct_nh_black, total_votes_2016, pct_of_year_2016, total_votes_2017, pct_of_year_2017) %>%
    arrange(desc(council_votes)),
    digits=0, 
    format.args=list(big.mark=','),
    col.names=c("Ward", "Votes for the 7th in 2015 Primary", "Pct of District", "% for Quiñones-Sánchez",  "Leader", "Endorsement", "% Hispanic", "% White", "% Black", "Votes in 2016 Primary", "Pct of District", "Votes in 2017 Primary", "Pct of District")
  )
Ward Votes for the 7th in 2015 Primary Pct of District % for Quiñones-Sánchez Leader Endorsement % Hispanic % White % Black Votes in 2016 Primary Pct of District Votes in 2017 Primary Pct of District
23 2,108 17 71 Timothy Savage Quiñones-Sánchez 28 19 46 4,021 16 1,284 22
62 1,774 14 33 Margaret Tartaglione Cruz 31 22 41 3,546 14 913 16
19 1,683 14 37 Carlos Matos Cruz 77 6 15 2,817 11 589 10
07 1,655 14 49 Angel Cruz Cruz 80 8 11 3,340 13 549 9
33 1,455 12 57 Donna Aument Cruz 63 10 18 3,337 13 566 10
42 1,177 10 63 Sharon Vaughn Quiñones-Sánchez 64 8 16 2,581 10 543 9
43 1,124 9 65 Emilio Vazquez Quiñones-Sánchez 66 3 29 2,259 9 445 8
18 611 5 51 Theresa Alicea Cruz 45 29 16 1,340 5 527 9
54 311 3 66 Alan Butkovitz Quiñones-Sánchez 17 18 49 906 4 196 3
25 133 1 59 Thomas Johnson Cruz 47 33 15 350 1 77 1
49 117 1 72 Shirley Gregory Cruz 28 0 72 192 1 44 1
31 98 1 63 Margaret Rzepski Cruz 23 57 8 265 1 129 2

The ward with the most votes for the 7th in 2015 also voted hard for Quiñones-Sánchez: the 23rd. Her 71% dominance of the 2,108 votes cast gave her an 886 vote edge, more than what she won the entire District by. That ward, which includes the district’s predominantly Black neighborhoods, turns out stronger than the rest. It represented 17% of the district’s votes in 2015, and then a whopping 22% of the votes in low-turnout 2017.

The next three most dominant wards were 62, 19, and 7 and those went for Morales with 63, 67, and 51% of the vote. (Notice that 62 is actually split by Council Districts; the number above is only for District 7). Those are the three wards led by State Senator Margaret Tartaglione, Carlos Matos, and Angel Cruz himself, the ward leaders that organized the challenge. Quiñones-Sánchez won all of the other Wards, but those three were enough to make the race close.

We can simplify the table above by combining all of the wards whose leaders supported Quiñones-Sánchez and those whose leaders supported Cruz.

View code
get_line <- function(x_total_votes, y_total_votes){
  ## solve p_x t_x+ p_y t_y > 50
  tot <- x_total_votes + y_total_votes
  tx <- x_total_votes / tot
  ty <- y_total_votes / tot

  slope <- -tx / ty
  intercept <- 50 / ty  # use 50 since proportions are x100
  c(intercept, slope)
}

endorsement_summary <- ward_pops %>%
  group_by(endorsement) %>%
  summarise(
    p_quinones_sanchez = weighted.mean(p_quinones_sanchez, w = council_votes),
    council_votes = sum(council_votes),
    total_votes_2016 = sum(total_votes_2016),
    total_votes_2017 = sum(total_votes_2017)
  )

candidate_results = with(
  endorsement_summary,
  tribble(
    ~candidate, 
    ~p_in_mqs_wards,
    ~p_in_challenger_wards, 
    ~total_votes_in_mqs_wards, 
    ~total_votes_in_challenger_wards,
    "Quiñones-Sánchez", 
    p_quinones_sanchez[endorsement == "Quiñones-Sánchez"],
    p_quinones_sanchez[endorsement == "Cruz"],
    council_votes[endorsement == "Quiñones-Sánchez"],
    council_votes[endorsement == "Cruz"],
    "Morales", 
    100 - p_quinones_sanchez[endorsement == "Quiñones-Sánchez"],
    100 - p_quinones_sanchez[endorsement == "Cruz"],
    council_votes[endorsement == "Quiñones-Sánchez"],
    council_votes[endorsement == "Cruz"]
  )
) %>%
  mutate(
    votes_in_mqs_wards = p_in_mqs_wards * total_votes_in_mqs_wards / 100,
    votes_in_challenger_wards = p_in_challenger_wards * total_votes_in_challenger_wards / 100
  )

knitr::kable(
  candidate_results %>% select(
    candidate,
    p_in_mqs_wards,
    votes_in_mqs_wards,
    p_in_challenger_wards,
    votes_in_challenger_wards
  ),
  digits=0, 
  format.args=list(big.mark=','),
  col.names=c("Candidate", "Percent in MQS-Endorsed Wards", "Votes in MQS-Endorsed Wards", "Percent in Cruz-Endorsed Wards",  "Votes in Cruz-Endorsed Wards")
)
Candidate Percent in MQS-Endorsed Wards Votes in MQS-Endorsed Wards Percent in Cruz-Endorsed Wards Votes in Cruz-Endorsed Wards
Quiñones-Sánchez 67 3,174 45 3,383
Morales 33 1,546 55 4,143

The four wards that backed Quiñones-Sánchez constituted only 39% of the votes in 2015, but she won them 2:1. The other eight wards combined for 61% of the vote, but Morales only won them 5:4. Quiñones-Sánchez’s wards represented more of the votes in 2017, boding well for this year, but it’s also likely that Cruz will do better in his wards than Morales did.

View code
line_2017 <- with(
  endorsement_summary,
  get_line(
    total_votes_2017[endorsement == "Quiñones-Sánchez"],
    total_votes_2017[endorsement == "Cruz"]
  )
)

line_2015 <- with(
  endorsement_summary,
  get_line(
    council_votes[endorsement == "Quiñones-Sánchez"],
    council_votes[endorsement == "Cruz"]
  )
)

library(ggrepel)
ggplot(
  candidate_results,
  aes(
    x=p_in_mqs_wards,
    y=p_in_challenger_wards
  )
) +
  geom_point() +
  geom_text_repel(aes(label=candidate)) +
  geom_abline(
    intercept = c(line_2015[1], line_2017[1]),
    slope = c(line_2015[2], line_2017[2]),
    linetype="dashed"
  ) +
  coord_fixed() +
  scale_x_continuous(
    "Percent in wards that endorsed Quiñones-Sánchez",
    breaks = seq(0,100,10)
  ) +
  scale_y_continuous(
    "Percent in wards that endorsed Cruz",
    breaks = seq(0, 100, 10)
  ) +
  annotate(
    geom="text",
    label=paste(c(2015, 2017), "turnout"),
    x=c(10, 8),
    y=c(
      line_2015[1] + 10 * line_2015[2],
      line_2017[1] + 8 * line_2017[2]
    ),
    hjust=0,
    vjust=-0.2,
    angle = atan(c(line_2015[2], line_2017[2])) / pi * 180,
    color="grey40"
  )+
  annotate(
    geom="text",
    x = 80,
    y=75,
    label="Candidate wins",
    fontface="bold",
    color = strong_green
  ) +
  geom_hline(yintercept = 50, color="grey50") +
  geom_vline(xintercept = 50, color="grey50")+
  expand_limits(x=100, y=30)+
  theme_sixtysix() +
  ggtitle(
    "The strength of District 7's wards in 2015",
    "Candidates to the top-right of dashed lines win."
  )

plot of chunk plot

Looking to May

So what are we to make of this race? Keep your eyes on the mobilization behind the Ward endorsements, especially the top turnout wards. Cruz will presumably do even better in his own ward than Morales did, so Quiñones-Sánchez’s success will likely hinge on continued high turnout in the 23rd and trying to consolidate the votes of the rest.

She managed to do that just well enough four years ago to eke out a 9 point victory. This year she faces a candidate without homophobic Facebook posts who also happens to be a Ward leader. If the results in every district were the same as in 2015 except Angel Cruz managed to win 77% of the vote in his own ward, the race would be an exact tie.

It’ll be close.