The neighborhoods that decide Council District 3

Could Jannie Lose?

Jannie Blackwell, the six term councilmember from West Philly’s District 3, is being challenged by Jamie Gauthier. The race appears to be shaping up as a reform-minded challenger against a powerful longtime incumbent, and it’s generated some serious buzz due to recent protests and homophobic remarks. Could it really be close?

More generally, I’m curious about the way Philadelphia’s gentrification and the 2016 election have changed electoral power structures. Even in 2015, Helen Gym won largely on the votes of Center City and the ring around it. But then 2016 happened, and turnout in those neighborhoods reached unprecedented heights. Exactly how powerful is that cohort? And while they’re strong citywide, have they taken over specific districts, to be able to dictate outcomes there?

Blackwell hasn’t faced a primary challenger since 1999, so we don’t have any evidence on her individual strength. Let’s instead look at recent competitive elections that could illustrate the neighborhood’s relative views.

What are the neighborhood cohorts that will decide District 3? Is the Krasner/Gym base strong enough on its own to dictate the election, or is the traditionally decisive West and Southwest Philly base still decisive?

District 3’s voting blocks

In the last three Democratic primaries, District 3 has displayed two clear voting blocks: University City and farther West/Southwest Philly.

View code
library(tidyverse)
library(rgdal)
library(rgeos)
library(sp)
library(ggmap)

sp_council <- readOGR("../../../data/gis/city_council/Council_Districts_2016.shp", verbose = FALSE)
sp_council <- spChFIDs(sp_council, as.character(sp_council$DISTRICT))

sp_divs <- readOGR("../../../data/gis/2016/2016_Ward_Divisions.shp", verbose = FALSE)
sp_divs <- spChFIDs(sp_divs, as.character(sp_divs$WARD_DIVSN))
sp_divs <- spTransform(sp_divs, CRS(proj4string(sp_council)))

load("../../../data/processed_data/df_major_2017_12_01.Rda")

ggcouncil <- fortify(sp_council) %>% mutate(council_district = id)
ggdivs <- fortify(sp_divs) %>% mutate(WARD_DIVSN = id)
View code
races <- tribble(
  ~year, ~OFFICE, ~office_name,
  "2015", "MAYOR", "Mayor",
  "2015", "COUNCIL AT LARGE", "City Council",
  "2016", "PRESIDENT OF THE UNITED STATES", "President",
  "2017", "DISTRICT ATTORNEY", "District Attorney"
) %>% mutate(election_name = paste(year, office_name))

candidate_votes <- df_major %>% 
  filter(election == "primary" & PARTY == "DEMOCRATIC") %>%
  inner_join(races %>% select(year, OFFICE)) %>%
  mutate(WARD_DIVSN = paste0(WARD16, DIV16)) %>%
  group_by(WARD_DIVSN, OFFICE, year, election) %>%
  mutate(
    total_votes = sum(VOTES),
    pvote = VOTES / sum(VOTES)
  ) %>% 
  group_by()

turnout_df <- candidate_votes %>%
  filter(OFFICE != "COUNCIL AT LARGE") %>% 
  group_by(WARD_DIVSN, OFFICE, year, election) %>%
  summarise(total_votes = sum(VOTES)) %>%
  left_join(
    sp_divs@data %>% select(WARD_DIVSN, AREA_SFT)
  )

turnout_df$AREA_SFT <- asnum(turnout_df$AREA_SFT)

The third council district covers West Philly, from the Schuylkill River to the city line.

View code
get_labpt_df <- function(sp){
  mat <- sapply(sp@polygons, slot, "labpt")
  df <- data.frame(x = mat[1,], y=mat[2,])
  return(
    cbind(sp@data, df)
  )
}

ggplot(ggcouncil, aes(x=long, y=lat)) +
  geom_polygon(
    aes(group=group),
    fill = strong_green, color = "white", size = 1
  ) +
  geom_text(
    data = get_labpt_df(sp_council),
    aes(x=x,y=y,label=DISTRICT)
  ) +
  theme_map_sixtysix() +
  coord_map() +
  ggtitle("Council Districts")

plot of chunk council_map

View code
DISTRICT <- "3"
sp_district <- sp_council[row.names(sp_council) == DISTRICT,]

bbox <- sp_district@bbox
## expand the bbox 20%for mapping
bbox <- rowMeans(bbox) + 1.2 * sweep(bbox, 1, rowMeans(bbox))

basemap <- get_map(bbox, maptype="toner-lite")

district_map <- ggmap(
  basemap, 
  extent="normal", 
  base_layer=ggplot(ggcouncil, aes(x=long, y=lat, group=group)),
  maprange = FALSE
) +
  theme_map_sixtysix() +
  coord_map(xlim=bbox[1,], ylim=bbox[2,])

sp_divs$council_district <- over(
  gCentroid(sp_divs, byid = TRUE), 
  sp_council
)$DISTRICT

sp_divs$in_bbox <- sapply(
  sp_divs@polygons,
  function(p) {
    coords <- p@Polygons[[1]]@coords
    any(
      coords[,1] > bbox[1,1] &
      coords[,1] < bbox[1,2] &
      coords[,2] > bbox[2,1] &
      coords[,2] < bbox[2,2] 
    )
  }
)

ggdivs <- ggdivs %>% 
  left_join(
    sp_divs@data %>% select(WARD_DIVSN, in_bbox)
  )

district_map +
  geom_polygon(
    aes(alpha = (id == DISTRICT)),
    fill="black",
    color = "grey50",
    size=2
  ) +
  scale_alpha_manual(values = c(`TRUE` = 0.2, `FALSE` = 0), guide = FALSE) +
  ggtitle(sprintf("Council District %s", DISTRICT))

plot of chunk district_map

First, let’s look at the results from four recent, compelling Democratic Primary races: 2015 City Council At Large and Mayor, 2016 President, and 2017 District Attorney. The maps below show the vote for the top two candidates in District 3 (except for City Council in 2015, where I use Helen Gym and Isaiah Thomas, who were 4th and 5th in the district, and 5th and 6th citywide.)

View code
candidate_votes <- candidate_votes %>%
  left_join(sp_divs@data %>% select(WARD_DIVSN, council_district))

## Choose the top two candidates in district 3
# Except for city council, where we choose Gym and Thomas
# candidate_votes %>% 
#   group_by(OFFICE, year, CANDIDATE) %>% 
#   summarise(
#     city_votes = sum(VOTES), 
#     district_votes = sum(VOTES * (council_district == DISTRICT))
#   ) %>% 
#   arrange(desc(district_votes)) %>%
#   filter(OFFICE == "COUNCIL AT LARGE")

candidates_to_compare <- tribble(
  ~year, ~OFFICE, ~CANDIDATE, ~candidate_name, ~row,
  "2015", "COUNCIL AT LARGE", "HELEN GYM", "Helen Gym", 1,
  "2015", "COUNCIL AT LARGE", "ISAIAH THOMAS", "Isaiah Thomas", 2,
  "2015", "MAYOR", "JIM KENNEY", "Jim Kenney",  1,
  "2015", "MAYOR", "ANTHONY HARDY WILLIAMS", "Anthony Hardy Williams", 2,
  "2016", "PRESIDENT OF THE UNITED STATES", "BERNIE SANDERS", "Bernie Sanders", 1,
  "2016", "PRESIDENT OF THE UNITED STATES", "HILLARY CLINTON", "Hillary Clinton", 2,
  "2017", "DISTRICT ATTORNEY", "LAWRENCE S KRASNER", "Larry Krasner", 1,
  "2017", "DISTRICT ATTORNEY", "TARIQ KARIM EL SHABAZZ","Tariq Karim El Shabazz", 2
)

candidate_votes <- candidate_votes %>%
  left_join(races) %>%
  left_join(candidates_to_compare)

vote_adjustment <- function(pct_vote, office){
  ifelse(office == "COUNCIL AT LARGE", pct_vote * 4, pct_vote)
}

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(
        candidate_votes %>% filter(!is.na(row))
      ),
    aes(fill = 100 * vote_adjustment(pvote, OFFICE))
  ) +
  scale_fill_viridis_c("Percent of Vote") +
  theme(
    legend.position =  "bottom",
    legend.direction = "horizontal",
    legend.justification = "center"
  ) +
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  geom_label(
    data=candidates_to_compare %>% left_join(races),
    aes(label = candidate_name),
    group=NA,
    hjust=0, vjust=1,
    x=-75.258,
    y=39.985
  ) +
  facet_grid(row ~ election_name) +
  theme(strip.text.y = element_blank()) +
  ggtitle(
    sprintf("Candidate performance in District %s", DISTRICT), 
    "Percent of vote (times 4 for Council, times 1 for other offices)"
  )

plot of chunk proportion
Notice two things. First, these competitive elections all split along the same boundaries: University City versus farther West and Southwest Philly. The candidates’ overall results were different (Sanders lost the district, Krasner won), but their relative strengths were exactly the same place. Demographically, the split is obvious: University City is predominantly White and wealthier, farther West is predominantly Black and has lower incomes. Even though Krasner did well across the city, and Shabazz poorly, Krasner did disproportionately well in University City, and Shabazz dispropotionately well farther West and Southwest.

Turnout is a more complicated story.

View code
# hist(turnout_df$total_votes / turnout_df$AREA_SFT)

turnout_df <- turnout_df %>%
  left_join(races)

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(turnout_df, by =c("id" = "WARD_DIVSN")),
    aes(fill = pmin(total_votes / AREA_SFT, 0.0005))
  ) +
  scale_fill_viridis_c(guide = FALSE) +
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  facet_wrap(~ election_name) +
  ggtitle(
    "Votes per mile in the Democratic Primary", 
    sprintf("Council District %s", DISTRICT)
  )

plot of chunk turnout_map
The 2017 election was completely different from 2015. In 2015, we saw the West and Southwest Philly neighborhoods dominate the vote, and decide the election. In 2017, University City (really, Cedar Park and Spruce Hill) boomed for Krasner. While Gym, Kenney, and Sanders all monopolized the University City percent of the vote, only Krasner multiplied that effect by monopolizing the turnout.

The change in votes per mile from 2015 to 2017 illustrates that starkly.

View code
turnout_wide <- turnout_df %>%
  group_by() %>%
  mutate(
    votes_per_sf = total_votes / AREA_SFT,
    key = paste0("votes_", year)
  ) %>%
  select(WARD_DIVSN, key, votes_per_sf) %>%
  spread(key = key, value = votes_per_sf)

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(turnout_wide),
    aes(
      fill = (votes_2017 - votes_2015)*5280^2
    )
  ) +
  scale_fill_gradient2(
    "Change in votes per mile\n  2015 - 2017",
    low=strong_orange,
    mid="white",
    high=strong_purple,
    midpoint=0
  ) +
  geom_polygon(
    fill=NA,
    color = "black",
    size=1
  )  +
  theme(legend.position = "bottom", legend.direction = "horizontal") +
  ggtitle(
    sprintf("Change in votes per mile, District %s", DISTRICT),
    "Orange: More votes in 2015, Purple: More in 2017"
  )

plot of chunk relative_turnout_15_17

To simplify the analysis, let’s divide the District into the two distinct coalitions: the Clinton/Hardy Williams “West & Southwest”, and the Krasner/Sanders “University City”. While they’re obvious on the map, we need a rule to split them up; ideally, there would be natural clusters to divide them. Using the simplistic division based on whether the average Krasner/Sanders vote was greater than 50% is surprisingly useful:

View code
district_categories <- candidate_votes %>% 
    filter(
      council_district == DISTRICT & 
        candidate_name %in% c("Larry Krasner", "Bernie Sanders")
    ) %>%
    group_by(WARD_DIVSN) %>%
    mutate(votes_2016 = total_votes[year == 2016]) %>%
    select(WARD_DIVSN, votes_2016, candidate_name, pvote) %>%
    spread(key=candidate_name, value=pvote)

ggplot(
  district_categories,
  aes(x = 100 * `Bernie Sanders`, y = 100 * `Larry Krasner`)
) +
  geom_point(aes(size = votes_2016), alpha = 0.7) +
  scale_size_area("Total Votes in 2016")+
  theme_sixtysix() +
  xlab("Percent of Vote for Bernie Sanders") +
  ylab("Percent of Vote for Larry Krasner") +
  coord_fixed() + 
  geom_abline(slope = -1, intercept =  100) +
  annotate(
    geom = "text",
    x = c(35, 20),
    y = c(15, 87),
    hjust = 0,
    label = c("West & Southwest", "University City"),
    color = c(strong_green, strong_purple),
    fontface="bold"
  ) +
  ggtitle("Divisions' vote", sprintf("District %s Democratic Primary", DISTRICT))

plot of chunk scatter_bernie_gym
We’ll call the divisions above the line University City, and those below the line West & Southwest.

Here’s the map of the cohorts that this categorization gives us.

View code
district_categories$category <- with(
  district_categories,
  (`Bernie Sanders` + `Larry Krasner`) > 1.0
)
district_categories$cat_name <- ifelse(
  district_categories$category,
  "University City",
  "West & Southwest"
)

district_map + 
  geom_polygon(
    data = ggdivs %>% 
      left_join(district_categories) %>% 
      filter(!is.na(cat_name)),
    aes(fill = cat_name)
  ) +
  scale_fill_manual(
    "",
    values = c("University City" = strong_purple, "West & Southwest" = strong_green)
  ) +
  ggtitle(sprintf("District %s neighborhood divisions", DISTRICT))

plot of chunk category_map
Looks reasonable.

How did the candidates do in each of the two sections? The boundary separates drastic performance splits.

View code
neighborhood_summary <- candidate_votes %>% 
  inner_join(candidates_to_compare) %>%
  group_by(candidate_name, election_name) %>%
  mutate(
    citywide_votes = sum(VOTES),
    citywide_pvote = 100 * sum(VOTES) / sum(total_votes)
  ) %>%
  filter(council_district == DISTRICT) %>%
  left_join(district_categories) %>%
  group_by(candidate_name, citywide_votes, citywide_pvote, election_name, cat_name) %>%
  summarise(
    votes = sum(VOTES),
    pvote = 100 * sum(VOTES) / sum(total_votes),
    total_votes = sum(total_votes)
  ) %>%
  group_by(candidate_name, election_name) %>%
  mutate(
    district_votes = sum(votes),
    district_pvote = 100 * sum(votes) / sum(total_votes)
  ) %>% select(
    election_name, candidate_name, citywide_pvote, district_pvote, cat_name, pvote, total_votes
  ) %>%
  gather(key="key", value="value", pvote, total_votes) %>%
  unite("key", cat_name, key) %>%
  spread(key, value)


neighborhood_summary %>%
  knitr::kable(
    digits=0, 
    format.args=list(big.mark=','),
    col.names=c("Election", "Candidate", "Citywide %", sprintf("District %s %%", DISTRICT), "University City %", "University City Turnout", "West & Southwest %", "West & Southwest Turnout")
  )

 

Election Candidate Citywide % District 3 % University City % University City Turnout West & Southwest % West & Southwest Turnout
2015 City Council Helen Gym 8 8 16 18,521 5 47,400
2015 City Council Isaiah Thomas 7 8 7 18,521 8 47,400
2015 Mayor Anthony Hardy Williams 26 48 24 5,738 55 19,335
2015 Mayor Jim Kenney 56 39 62 5,738 33 19,335
2016 President Bernie Sanders 37 39 59 12,376 30 27,991
2016 President Hillary Clinton 63 61 41 12,376 70 27,991
2017 District Attorney Larry Krasner 38 51 73 7,125 36 11,113
2017 District Attorney Tariq Karim El Shabazz 12 15 5 7,125 22 11,113

Gym won 16% in University City, but only 5% in West & Southwest; Thomas ran an even 7 and 8%, respectively. Kenney won 62% in University City and only 33% in West and Southwest, Hardy Williams flipped that for 24 and 55%. Krasner won an astounding 73% of the vote in University City (in a crowded race!), and only 36% in West and Southwest, though that was still good enough to win the neighborhood. El Shabazz won 5 and 22%.

Also, notice the dramatic change in relative turnout. In the 2015 Mayoral race, West & Southwest had 3.4 times the vote of University City. The dramatic turnout swing of 2017 shrunk that to 1.6. West and Southwest still hold most of the voters (among substantially more households), but the relative proportions needed shift.

The relative power of West and Southwest and University City

How much does the power shift between the two cohorts? Let’s do some math.

How much does a candidate need from each of the sections to win? Let t_i be the relative turnout in section i, defined as the proportion of total votes. So in the 2017 District Attorney Race, t_i was 0.39 for University City, and 0.61 for West & Southwest. Let p_ic be the proportion of the vote received by candidate c in section i, so in 2017, p is 0.73 for Krasner in University City.

Then a candidate wins a two-way race whenever the turnout-weighted proportion of their vote is greater than 0.5: sum_over_i(t_i p_ic) > 0.5.

Since we’ve divided District 3 into only 2 sections, we can plot this on a two-way plot. On the x-axis, let’s map a candidate’s percent of the vote in University City, and on the y, a candidate’s percent of the vote in West & Southwest (assuming a two-person race). The candidate wins whenever the average of their proportions, weighted by \(\tilde{t}\) is greater than 50%. If the turnout looks like 2015, West & Southwest easily carry the District; if it looks like 2017, the sections carry nearly equal weight. The dashed lines show the win boundaries; candidates to the top-right of the lines win.

I’ll plot only the two-candidate vote for the top two candidates in the district for each race, to emulate a two-person race. (For City Council in 2015, I use Helen Gym and Isaiah Thomas, who were 4th and 5th in the district, and 5th and 6th citywide.)

View code
get_line <- function(x_total_votes, y_total_votes){
  ## solve p_x t_x+ p_y t_y > 50
  tot <- x_total_votes + y_total_votes
  tx <- x_total_votes / tot
  ty <- y_total_votes / tot

  slope <- -tx / ty
  intercept <- 50 / ty  # use 50 since proportions are x100
  c(intercept, slope)
}

line_2017 <- with(
  neighborhood_summary,
  get_line(
    `University City_total_votes`[candidate_name == "Larry Krasner"],
    `West & Southwest_total_votes`[candidate_name == "Larry Krasner"]
  )
)

## get the two-candidate vote
neighborhood_summary <- neighborhood_summary %>%
  group_by(election_name)  %>% 
  mutate(
    ucity_pvote_2cand = `University City_pvote` / sum(`University City_pvote`),
    wsw_pvote_2cand = `West & Southwest_pvote`/sum(`West & Southwest_pvote`)
  )

line_2015 <- with(
  neighborhood_summary,
  get_line(
    `University City_total_votes`[candidate_name == "Jim Kenney"],
    `West & Southwest_total_votes`[candidate_name == "Jim Kenney"]
  )
)

library(ggrepel)

ggplot(
  neighborhood_summary,
  aes(
    x=100*ucity_pvote_2cand,
    y=100*wsw_pvote_2cand
  )
) +
  geom_point() +
  geom_text_repel(aes(label=candidate_name)) +
  geom_abline(
    intercept = c(line_2015[1], line_2017[1]),
    slope = c(line_2015[2], line_2017[2]),
    linetype="dashed"
  ) +
  coord_fixed() + 
  scale_x_continuous(
    "University City percent of vote",
    breaks = seq(0,100,10)
  ) +
  scale_y_continuous(
    "West & Southwest percent of vote",
    breaks = seq(0, 100, 10)
  ) +
  annotate(
    geom="text",
    label=paste(c(2015, 2017), "turnout"),
    x=c(10, 8),
    y=c(
      line_2015[1] + 10 * line_2015[2],
      line_2017[1] + 8 * line_2017[2]
    ),
    hjust=0,
    vjust=-0.2,
    angle = atan(c(line_2015[2], line_2017[2])) / pi * 180,
    color="grey40"
  )+
  annotate(
    geom="text",
    x = 70,
    y=75,
    label="Candidate wins",
    fontface="bold",
    color = strong_green
  ) +
  geom_hline(yintercept = 50, color="grey50") +
  geom_vline(xintercept = 50, color="grey50")+
  expand_limits(x=100, y=80)+
  theme_sixtysix() +
  ggtitle(
    "The relative strength of W & SW Philly and U City",
    "Candidates to the top-right of the lines win."
  )

plot of chunk win_scatter

Hillary Clinton and Larry Krasner won the district in a landslide, with Clinton winning despite losing University City to Sanders. Helen Gym and Jim Kenney were in the turnout-dependent zone: they would win the district if turnout looked like 2017, and lose it if turnout looked like 2015 (and vice versa for Hardy Williams and Thomas).

So could a candidate who monopolized University City win? Maybe, but it’s hard. If turnout looks like 2017, then a candidate who wins 70% of the University City vote still needs to win 37% of the West and Southwest vote. If the turnout looks like 2015, the required W/SW vote jumps to 44. Clinton and Krasner pulled off dominant victories that would win in any turnout climate; Hardy Williams, Kenney, El Shabazz, and Gym saw the neighborhoods’ turnouts be decisive.

Looking to May

I don’t know how Jamie Gauthier will fare in University City or in West & Southwest Philly, but my hunch is that she’s seeking the reformist, University City lane. But that’s a hard lane to win in. Even if she achieves Gym and Kenney percentages, she would need to additionally inspire turnout the way that Krasner did. Alternatively, she needs to pull enough support from West and Southwest; significantly more than Gym and Kenney did. It’s possible, but a steep climb.

What At Large City Councilors most polarized the vote?

May’s primary will include elections for Philadelphia City Council. The council is constituted of 17 councilors, ten of whom are voted in by specific districts and seven of whom are At Large, voted in by the city as a whole. Of those seven at large, only five can come from the same party. In practice means that five Democrats will win this primary, and then win landslide elections in November.

In advance of May, I’m going to be looking at what it takes to win a Democratic City Council At Large seat. Today, let’s look at how polarizing candidates are.

[Note: I’m starting today making my blog posts in RMarkdown. Click the View Code to see the R code!]

View code
## You can access the data at: 
## https://github.com/jtannen/jtannen.github.io/tree/master/data
# load("df_major_2017_12_01.Rda")

df_major$CANDIDATE <- gsub("\\s+", " ", df_major$CANDIDATE)
df_major$PARTY[df_major$PARTY == "DEMOCRATIC"] <- 'DEMOCRAT'

df_major <- df_major %>% 
  filter(
    election == "primary" &
      OFFICE == "COUNCIL AT LARGE" &
      PARTY %in% c("DEMOCRAT")
  )

df_total <- df_major %>% 
  group_by(CANDIDATE, year, PARTY) %>%
  summarise(votes = sum(VOTES)) %>%
  group_by(year, PARTY) %>%
  arrange(desc(votes)) %>%
  mutate(rank = rank(desc(votes)))

div_votes <- df_major %>%
  group_by(WARD16, DIV16, OFFICE, year) %>%
  summarise(div_votes = sum(VOTES))

Measuring Vote Polarization

One way to measure polarization is using the Gini coefficient, common in studying inequality. Suppose for each candidate we line up the precincts in order of their percent of the vote. We then move down the precincts, adding up the total voters and the votes for that candidate. We plot the curve, with the cumulative voters along the x axis, and the cumulative votes for that candidate along the y.

The curvature of that line is a measure of the inequality of the distribution of votes. In this case, I call that polarization. Suppose a candidate got 50% of the vote in every single precinct. Then the curve would just be a straight line with a slope of 0.5; there would be no polarization. Alternatively, if a candidate got zero of the votes from 90% of the precincts, but all of the vote in the remaining 10%, then the curve would be flat at 0 for the first 90% of the x-axis, but then bend and shoot up; a sharp curve and a lot of polarization.

View code
vote_cdf <- df_major %>%
  left_join(div_votes) %>%
  group_by(CANDIDATE, year) %>%
  mutate(
    p_vote_div = VOTES / div_votes,
    cand_vote_total = sum(VOTES)
  ) %>%
  arrange(p_vote_div) %>%
  mutate(
    cum_votes = cumsum(VOTES),
    vote_cdf = cum_votes / cand_vote_total,
    cum_denom = cumsum(div_votes) / sum(div_votes)
  ) 

ggplot(
  vote_cdf %>% 
    left_join(df_total) %>%
    filter(year == 2015 & rank <= 7),
  aes(x=cum_denom, y=cum_votes)
) + geom_line(
    aes(group=CANDIDATE, color=CANDIDATE),
    size=1
) +
  geom_text(
    data = vote_cdf %>% 
    left_join(df_total) %>%
    filter(year == 2015 & rank <= 7) %>%
      group_by(CANDIDATE) %>%
      filter(cum_votes == max(cum_votes)),
    aes(label = tolower(CANDIDATE)),
    x = 1.01,
    hjust = 0
  ) +
  xlab("Cumulative voters") +
  scale_y_continuous(
    "Cumulative votes for candidate",
    labels=scales::comma
  ) +
  scale_color_discrete(guide=FALSE)+
  expand_limits(x=1.3)+
  theme_sixtysix() +
  ggtitle(
    "Vote distributions for 2015 Council At Large",
    "Top seven finishers"
  )

plot of chunk gini

Above is that plot for the top seven At Large finishers in 2015 (remember that five Democrats can win). Helen Gym was the fifth. Interestingly, she also was the most polarizing: 49.4% of her votes came from her best 25% of divisions. For comparison, 38.3% of Derek Green’s votes came from his best 25% of divisions.

If we scale each candidate’s y-axis by their final total votes, the difference in curvature is even more stark.

View code
ggplot(
  vote_cdf %>% 
    left_join(df_total) %>%
    filter(year == 2015 & rank <= 7),
  aes(x=cum_denom, y=vote_cdf)
) + geom_line(
  aes(group=CANDIDATE, color=CANDIDATE),
  size=1
) +
  coord_fixed() +
  geom_abline(slope = 1, yintercept=0) +
  xlab("Cumulative voters") +
  ylab("Cumulative proportion of candidate's votes") +
  scale_color_discrete(guide = FALSE) +
  annotate(
    geom="text",
    y = c(0.45, 0.3),
    x = c(0.52, 0.6),
    hjust = c(1, 0),
    label = c("william k greenlee", "helen gym")
  ) +
  theme_sixtysix() +
    ggtitle(
    "Vote distributions for 2015 Council At Large",
    "Top seven finishers, scaled for total votes"
  )

plot of chunk gini_scaled

So Helen Gym snuck in four years ago, with a highly polarized vote. Is that common for new challengers? Not really. Usually, it’s hard to win without more even support.

To summarise the curvature into a single number, the Gini coefficient is defined as the area above the curve but below the 45 degree line, divided by the total area of the triangle. Notice that the more curved the line, the more area between the 45-degree line and the curve, and the higher the coefficient. If there is no inequality, the Gini coefficient is 0, if there’s complete inequality, it’s 1. Helen Gym’s Gini coefficient is 0.35, Bill Greenlee’s is 0.19.

Below I plot each candidate’s proportion of the vote on the x-axis (blue names are winners), and their Gini coefficient on the y-axis (higher values are more polarized).

View code
gini <- vote_cdf %>% 
  arrange(CANDIDATE, year, cum_denom) %>%
  group_by(CANDIDATE, year) %>%
  mutate(
    is_first = cum_denom == min(cum_denom),
    bin_width = cum_denom - ifelse(is_first, 0, lag(cum_denom)),
    avg_height = (vote_cdf + ifelse(is_first, 0, lag(vote_cdf)))/2,
    area = bin_width * avg_height
  ) %>% 
  summarise(
    gini = 1 - 2 * sum(area),
    total_votes = weighted.mean(p_vote_div, div_votes)
  )

ggplot(
  gini %>% left_join(df_total) %>% filter(rank <= 10), 
  aes(x=total_votes, y=gini)
) + 
  geom_text(
    aes(label=tolower(CANDIDATE), color=(rank<=5)),
    size = 3
  ) +
  scale_color_manual(
    "winner", 
    values=c(`TRUE` = strong_blue, `FALSE` = strong_red),
    guide = FALSE
  )+
  scale_x_continuous(
    "proportion of vote",
    expand=expand_scale(mult=0.2)
  ) +
  ylab("gini coefficient (higher means more polarization)")+
  facet_wrap(~year) +
  theme_sixtysix() +
  ggtitle("Total votes versus vote polarization",
          "Top ten finishers for City Council At Large. Winners in blue.")

plot of chunk gini_scatter

Helen Gym had the highest Gini coefficient of any winner in the last four elections, and no one else was close.

There are a few things going on here. First, the winners are usually incumbents, and incumbents probably benefit from name recognition across the city. All of the winners in 2011 were incumbents, for example.

But even the non-incumbents who won had more even support. Allan Domb had the second lowest gini coefficient in 2015, and Derek Green the third. Greenlee and Bill Green had the lowest Gini coefficients when they won as challengers in 2007 (Greenlee was technically an incumbent from a 2006 Special Election).

There are a few ways to view Helen Gym’s polarization. Remember that this is unrelated to total proportion of the vote; she won the fifth most votes, more than candidates who had even and low support across the city. She did so by particularly consolidating her neighborhoods, mobilizing the wealthier, whiter progressive wards that formed her coalition (presumably with the incumbency, she will receive broader support this time around).

View code
# library(sf)
# divs <- st_read("2016_Ward_Divisions.shp", quiet = TRUE)

gym_vote <- divs %>% 
  left_join(
    df_major %>% 
      filter(year == 2015) %>% 
      mutate(WARD_DIVSN = paste0(WARD16, DIV16)) %>% 
      group_by(WARD_DIVSN) %>% 
      mutate(p_vote = VOTES / sum(VOTES)) %>% 
      filter(CANDIDATE == "HELEN GYM")
    )

ggplot(gym_vote)+ 
  geom_sf(
    aes(fill = p_vote * 100),
    color = NA
  ) +
  theme_map_sixtysix() +
  scale_fill_viridis_c("% of Vote") +
  ggtitle(
    "Helen Gym's percent of the vote, 2015",
    "Voters could vote for up to five At Large candidates"
  )

plot of chunk gym_vote

One perspective is that she won entirely on the support of whiter, wealthier liberals. Another is that she managed to squeeze the last drips of votes out of those neighborhoods, eking out her edge over candidates with similar city-wide votes. Notably, the common concern around a candidate with this base would be that she would ignore the lower income, Black and Hispanic neighborhoods that didn’t vote for her, but I don’t think that’s a common complaint lodged against the fierce public education advocate.

What coalitions win the City Council At Large seats?

One question I find fascinating is what coalitions candidates use to win. Gym clearly won with the wealthier white progressive wards, but candidates may also just as often win with support of the Black wards, or the more conservative Northeast and deep South Philly. In the upcoming months, I’m going to dig more into this question.