Election Day Links

Election day is May 21st. Here’s a list of ways to follow along with Sixty-Six Wards, including an exciting announcement.

The Turnout Tracker

When you vote, submit your Voter Number to https://bit.ly/sixtysixturnout.

Then follow along live at
https://jtannen.github.io/turnout_tracker_philadelphia.html

New: The Needle

When the polls close at 8pm, head over to https://jtannen.github.io/election_needle.html where I’ll process the live election returns, project out the precincts that haven’t voted yet, and produce real-time probabilities of winning for each candidate.

Who will win the Court of Common Pleas?

On May 21st, Philadelphia won’t just be voting for Mayor, City Council, and a few “row offices”. Besides those, we will also choose nine judges: two for the Superior Court, one for Municipal Court, and six for the Court of Common Pleas. (Really, this is just the primary. But the Common Pleas and Municipal nominees will almost certainly win in November).

I’ve spent time here before looking at the Court of Common Pleas. The court is responsible for the city’s major civil and criminal trials. Its judges are elected to ten-year terms. And we elect them by drawing out of a coffee can.

The result is that Philadelphia often elects judges who are unfit for the office. In 2015, Scott Diclaudio won; months later he would be censured for misconduct and negligence, and then get caught having given illegal donations in the DA Seth Williams probe. He was in the first position on the ballot. Lyris Younge was at the bottom of the first column that year and won. She has since been removed from Family Court for violating family rights and made headlines by evicting a full building of residents with less than a week’s notice.

I’ve looked before at the effect of ballot position on the Court’s elections: being in the first column nearly triples your votes. Today, I’ll use that model to simulate who will win in the upcoming race.

It’s easy to predict the Common Pleas Election

Predicting elections is hard, especially without surveys. When I tried it for November’s state house election, I could only make imprecise predictions, and even then had mixed results. Why would this time be any different?

The key is that voters know nothing about the race. In May, voters are selecting six Common Pleas judges from among twenty-five candidates. The median voter will know the name of exactly zero of them before they enter the booth.

This lack of knowledge means that structural components end up mattering a lot. What column your name is listed in, whether you’re endorsed in the Inquirer, or how many polling places your name is handed out on a piece of paper outside of, all dictate who will win. We can observe or guess each of those, and come up with pretty accurate predictions.

When I did this exercise two years ago, I got the number of winners from the first column, and the number endorsed by the DCC, exactly right (yes, it’s that easy).

Electing qualified judges

These races matter. Philadelphia regularly elects judges who should not be judges, granting them the authority over a courtroom that decides the city’s most important cases.

As a measure of judicial quality, I use the recommendations from the Judicial Commission of the Philadelphia Bar Association. The commission evaluates candidates by an interview, a questionnaire, and interviews with people who work with them. It then rates candidates as Recommended or Not Recommended. Usually, it recommends about 2/3 of the candidates–many more than can win–and is useful as a lower-bar measure of candidate quality. My understanding is that when a candidate isn’t recommended, there’s a significant reason, though the Commission’s exact findings are kept confidential.

The ratings are so useful that in 2015 the Philadelphia Inquirer stopped endorsing judicial candidates on its own, and began printing the Commission’s recommendations (this also makes the ratings much more important for candidates).

Recently, the Commission introduced a Highly Recommended category. Unfortunately, it’s too early to know how effective it’s been. In 2015 there were three Highly Recommended candidates, and all three won. But they didn’t do statistically significantly better than the plain old Recommended Candidates in terms of votes (albeit with just three observations). In 2017, there were no Highly Recommended candidates.

This time around, there are four Highly Recommended candidates: James Crulish, Anthony Kyriakakis, Chris Hall, and Tiffany Palmer (a fifth, Michelle Hangley, dropped out because of her unlucky ballot position). None of those four are in the first column, so this year could prove a useful measure of the Bar’s impact.

One note: candidates that do not submit questionnaires are not rated Recommended. Rather than reward the perverse incentive for candidates to not submit, I will consider candidates who have not yet submitted paperwork as Not Recommended.

Where will ballot position matter most?

When I analysed the determinants of Common Pleas voting, being in the first column nearly tripled a candidate’s votes. Endorsements from the Democratic City Committee (DCC) and the Inquirer doubled the votes (though the causal direction here is more dubious). Remember that the Inquirer has recently just adopted the Bar’s recommendations, so the importance of the Inquirer will be transferred to the Philadelphia Bar.

The ballot this year is wide. There are just four rows and seven columns. With 25 candidates vying for 6 spots, a number of later column candidates will almost certainly win.

Two years ago when I simulated the race, I did so at the city-level, ignoring neighborhood patterns. But we might see vastly disproportionate turnout in some neighborhoods, and it happens that those are the neighborhoods where recommended candidates do best. So let’s be more careful. First, how much does each determinant of candidates’ votes vary by neighborhood?

View code
library(ggplot2)
library(dplyr)
library(tidyr)
library(tibble)
library(readr)
library(forcats)

source("../../admin_scripts/util.R")

ballot <- read.csv("../../data/common_pleas/judicial_ballot_position.csv")
ballot$name <- tolower(ballot$name)
ballot$name <- gsub("[[:punct:]]", " ", ballot$name)
ballot$name <- trimws(ballot$name)

years <- seq(2009, 2017, 2)
dfs <- list()
for(y in years){
dfs[[as.character(y)]] <- read_csv(paste0("../../data/raw_election_data/", y, "_primary.csv")) %>% 
mutate(
year = y,
CANDIDATE = tolower(CANDIDATE),
CANDIDATE = gsub("\\s+", " ", CANDIDATE)
) %>%
filter(grepl("JUDGE OF THE COURT OF COMMON PLEAS-D", OFFICE))
print(y)
}

df <- bind_rows(dfs)

df <- df %>% 
mutate(WARD = sprintf("%02d", WARD)) %>%
group_by(WARD, year, CANDIDATE) %>% 
summarise(VOTES = sum(VOTES))

df_total <- df %>% 
group_by(year, CANDIDATE) %>% 
summarise(VOTES = sum(VOTES))

election <- data.frame(
year = c(2009, 2011, 2013, 2015, 2017),
votefor = c(7, 10, 6, 12, 9)
)

election <- election %>% left_join(
ballot %>% group_by(year) %>% 
summarise(
nrows = max(rownumber),
ncols = max(colnumber), 
ncand = n(),
n_philacomm = sum(philacommrec),
n_inq = sum(inq),
n_dcc = sum(dcc)
)
)

df_total <- df_total %>% 
left_join(election) %>%
group_by(year) %>%
arrange(desc(year), desc(VOTES)) %>%
mutate(finish = 1:n()) %>%
mutate(winner = finish <= votefor)

df_total <- df_total %>% inner_join(
ballot,
by = c("CANDIDATE" = "name", "year" = "year")
)

df_total <- df_total %>%
group_by(year) %>%
mutate(pvote = VOTES / sum(VOTES))

df_total <- df_total %>%
filter(CANDIDATE != "write in") 

prep_df_for_lm <- function(df, use_candidate=TRUE){
df <- df %>% mutate(
rownumber = fct_relevel(factor(as.character(rownumber)), "3"),
colnumber = fct_relevel(factor(as.character(colnumber)), "3"),
col1 = colnumber == 1,
col2 = colnumber == 2,
col3 = colnumber == 3,
row1 = rownumber == 1,
row2 = rownumber == 2,
is_rec = philacommrec > 0,
is_highly_rec = philacommrec==2,
inq=inq>0
)
if(use_candidate)
df <- df %>% mutate(
candidate_year = paste(CANDIDATE, year, sep="::")
)
return(df)
}

df_complemented <- df %>% 
filter(CANDIDATE != "write in") %>%
group_by(WARD) %>%
mutate(pvote = VOTES / sum(VOTES)) %>%
inner_join(
df_total %>% prep_df_for_lm(),
by = c("year", "CANDIDATE"),
suffix = c("", ".total")
) 

# fit_model <- function(df){
#   lmfit <- lm(
#     log(pvote + 0.001) ~ 
#       row1 + row2 +
#       # col1*I(votefor - nrows) + 
#       # col2*I(votefor - nrows) + 
#       # col3*I(votefor - nrows) +
#       I(gender == "F") +
#       col1 + col2 + col3 +
#       inq + dcc + 
#       is_rec + is_highly_rec +
#       factor(year),
#     data = df_total %>% prep_df_for_lm()
#   )
#   return(lmfit)
# }
# 
# lmfit <- fit_model(df_total)
# summary(lmfit)
View code
library(lme4)

## better opt: https://github.com/lme4/lme4/issues/98
library(nloptr)
defaultControl <- list(
algorithm="NLOPT_LN_BOBYQA",xtol_rel=1e-6,maxeval=1e5
)
nloptwrap2 <- function(fn,par,lower,upper,control=list(),...) {
for (n in names(defaultControl)) 
if (is.null(control[[n]])) control[[n]] <- defaultControl[[n]]
res <- nloptr(x0=par,eval_f=fn,lb=lower,ub=upper,opts=control,...)
with(res,list(par=solution,
fval=objective,
feval=iterations,
conv=if (status>0) 0 else status,
message=message))
}

rfit <- lmer(
log(pvote + 0.001) ~ 
(1 | candidate_year)+
row1 + row2 +
I(gender == "F") +
col1 + col2 +
dcc + 
is_rec + is_highly_rec +
factor(year) +
(
# row1 + row2 +
# col1*I(votefor - nrows) + 
# col2*I(votefor - nrows) + 
# col3*I(votefor - nrows) +
I(gender == "F") +
col1 + col2 + #col3 +
dcc +
is_rec + is_highly_rec 
# factor(year)
| WARD
),
df_complemented
)

ranef <- as.data.frame(ranef(rfit)$WARD) %>% 
rownames_to_column("WARD")  %>%
gather("variable", "random_effect", -WARD) %>%
mutate(
fixed_effect = fixef(rfit)[variable],
effect = random_effect + fixed_effect
)

Recommended candidates receive about 1.8 times as many votes on average, drawing almost all of that advantage from Center City and Chestnut Hill & Mount Airy. While overall we didn’t see a benefit to being Highly Recommended, in the neighborhood drill-down, we do see tentative evidence that those candidates did even better in the wealthier wards (a highly recommended candidate would receive the sum of the Recommended + Highly Recommended effects below).

View code
library(sf)

wards <- read_sf("../../data/gis/2016/2016_Wards.shp")

ward_effects <- wards %>% 
mutate(WARD = sprintf("%02d", WARD)) %>%  
left_join(
ranef,
by=c("WARD" = "WARD")
)

format_effect <- function(x){
paste0("x", round(exp(x), 1))
}

fill_min <- ward_effects %>%
filter(
variable %in% c(
"col1TRUE", "col2TRUE", "dcc", "is_recTRUE", "is_highly_recTRUE"
)
)  %>%
with(c(min(effect), max(effect)))

format_variables <- c(
is_recTRUE="Recommended",
is_highly_recTRUE="Highly Recommended",
dcc = "Dem. City Committee Endorsement",
col1TRUE = "First Column",
col2TRUE = "Second Column"
)

ward_effects$variable_name <- factor(
format_variables[ward_effects$variable],
levels = format_variables
)

ggplot(
ward_effects %>% 
filter(variable_name %in% c(format_variables[1:2]))
) + 
geom_sf(aes(fill=effect), color = NA) +
facet_wrap(~variable_name) +
scale_fill_viridis_c(
"Multiplicative\nDifference in Votes", 
labels=format_effect, 
breaks = seq(-2, 3, 0.4)
) +
theme_map_sixtysix() %+replace%
theme(legend.position="right") +
expand_limits(fill = fill_min) +
ggtitle("Recommended candidates do better\n  in wealthier wards") 

What’s going on in the other wards? The Democratic Party is especially important, especially in the traditionally-strong Black wards. (Note, I can’t identify here if that’s because they strongly adopt the party’s endorsement, or if the party endorses the candidates who would already do well). Interestingly, the party wasn’t so important in the Hispanic wards of North Philly or the Northeast.

View code
ggplot(
ward_effects %>% 
filter(variable_name %in% c(format_variables[3]))
) + 
geom_sf(aes(fill=effect), color = NA) +
facet_wrap(~variable_name) +
scale_fill_viridis_c(
"Multiplicative\nDifference in Votes", 
labels=format_effect, 
breaks = seq(-2, 3, 0.4)
) +
theme_map_sixtysix() %+replace%
theme(legend.position="right") +
expand_limits(fill = fill_min) +
ggtitle("Party-endorsed candidates\ndo better in predominantly-Black wards")

Unfortunately, all of these effects are swamped by ballot position. Candidates in the first column receive twice as many votes in every single type of ward, but especially many in lower-income wards.

View code
ggplot(
ward_effects %>% 
filter(variable_name %in% c(format_variables[4:5]))
) + 
geom_sf(aes(fill=effect), color = NA) +
facet_wrap(~variable_name) +
scale_fill_viridis_c(
"Multiplicative\nDifference in Votes", 
labels=format_effect, 
breaks = seq(-2, 3, 0.4)
) +
theme_map_sixtysix() %+replace%
theme(legend.position="right") +
expand_limits(fill = fill_min) +
ggtitle(
"First-column candidates do better everywhere", 
"Relative to third column or later"
)

Simulating the election

The task of predicting the election comes down to using these correlations, and then randomly sampling uncertainty of the correct size.

I use each candidate’s ballot position and endorsements to come up with a baseline estimate of how they’ll do in each ward. There is a lot of uncertainty for a given candidate, so I add random noise to each candidate (candidate-level effects that aren’t explained by my model have a standard deviation of about +/- 30% of their votes.)

I scale up the ward performance by my turnout projection. I’m using my high-turnout projections, which assume that the post-2016 surge continues in Center City and its ring, and will in general help recommended candidates, who do better in those wealthier wards.

View code
turnout_2019 <- read.csv(
"../turnout_2019_primary/turnout_projections_2019.csv"
) %>%
mutate(WARD = sprintf("%02d", WARD16)) %>%
group_by(WARD) %>%
summarise(
high_projection = sum(high_projection, na.rm = TRUE),
low_projection = sum(low_projection, na.rm = TRUE)
)

replace_na <- function(x, r=0) ifelse(is.na(x), r, x)

df_2019 <- ballot %>% 
filter(year == 2019) %>%
mutate(
philacommrec = replace_na(philacommrec),
dcc = replace_na(dcc),
inq = (philacommrec > 0),
year = 2017  ## fake year to trick lm
) %>%
prep_df_for_lm(use_candidate = FALSE) %>%
left_join(
expand.grid(
name = unique(ballot$name),
WARD = unique(turnout_2019$WARD)
)
) %>% left_join(turnout_2019)


## pretend it's one candidate, but then marginalize over candidates
df_2019$log_pvote <- predict(
rfit,
newdata = df_2019 %>% 
mutate(candidate_year = df_complemented$candidate_year[1])
)

df_2019 <- df_2019 %>%
mutate(pvote_prop = exp(log_pvote))

sd_cand <- sd(ranef(rfit)$candidate_year$`(Intercept)`)
simdf <- expand.grid(
sim = 1:1000,
name = unique(df_2019$name)
) %>%
mutate(cand_re = rnorm(n(), sd = sd_cand))

## https://econsultsolutions.com/simulating-the-court-of-common-pleas-election/
votes_per_voter <- 4.5

simdf <- df_2019 %>%
left_join(simdf) %>%
mutate(pvote_prop_sim = pvote_prop * exp(cand_re)) %>%
group_by(WARD, sim) %>%
mutate(pvote = pvote_prop_sim / sum(pvote_prop_sim)) %>%
group_by() %>%
mutate(votes = high_projection * votes_per_voter * pvote) %>%
group_by(sim, name) %>%
summarise(votes = sum(votes)) %>%
left_join(ballot %>% filter(year == 2019)) %>%
prep_df_for_lm(use_candidate = FALSE)

simdf <- simdf %>%
group_by(sim) %>%
mutate(
vote_rank = rank(desc(votes)),
winner = rank(vote_rank) <= 6
)

remove_na <- function(x, r=0) return(ifelse(is.na(x), r, x))

winner_df <- simdf %>% 
group_by(sim) %>%
summarise(
winners_rec = sum(is_rec * winner),
winners_highly_rec = sum(is_highly_rec * winner),
winners_col1 = sum(col1 * winner),
winners_col2 = sum(col2 * winner),
winners_col3 = sum(col3 * winner),
winners_dcc = sum(remove_na(dcc) * winner),
winners_women = sum((gender == "F") * winner)
)

Under the hood, the model has an estimate for each candidate. But I’m not totally comfortable with blasting those out (and what feedback loops that might cause), so let’s look at the high-level predictions instead.

View code
col_sim <- winner_df %>%
select(winners_col1, winners_col2) %>%
mutate(
`Third Column or later` = 6 - winners_col1 - winners_col2
) %>%
rename(
`First Column` = winners_col1,
`Second Column` = winners_col2
)

rec_sim <- winner_df %>%
select(winners_rec, winners_highly_rec, winners_dcc) %>%
mutate(
`Not Recommended` = 6 - winners_rec
) %>%
rename(
`All Recommended` = winners_rec,
`Highly Recommended` = winners_highly_rec,
`DCC Endorsed` = winners_dcc
)

gender_sim <- winner_df %>%
select(winners_women) %>%
mutate(
`Men` = 6 - winners_women
) %>%
rename(
`Women` = winners_women
)

plot_winners <- function(
sim_df, 
title, 
facet_order,
colors
){
gathered_df <- sim_df %>%
gather("facet", "n_winners") %>%
mutate(
facet = factor(
facet,
facet_order
)
) %>%
group_by(facet, n_winners) %>%
count() %>%
group_by(facet) %>%
mutate(prop = n / sum(n))

facet_lev <- levels(gathered_df$facet)
names(colors) <- facet_lev
ggplot(
gathered_df,
aes(x=n_winners)
) +
geom_bar(aes(y = prop, fill = facet), stat="identity") +
theme_sixtysix() +
expand_limits(x=c(0,7)) +
scale_x_continuous("Count of winners", breaks = 0:7) +
ylab("Proportion of simulations") +
scale_fill_manual(values = colors, guide=FALSE)+
facet_grid(facet~.) +
geom_vline(xintercept=6, linetype="dashed") +
ggtitle(title)
}

The model is really optimistic about how many Recommended candidates win, mostly because there’s only one Not Recommended candidate in the first two columns. In 66% of simulations all six winners are Recommended, and in 33% all but one are (Jon Marshall at the bottom of the first column is usually the lone Not Recommended winner).

View code
plot_winners(
rec_sim, 
"Simulations by Recommendation", 
c("Highly Recommended", "All Recommended", "Not Recommended", "DCC Endorsed"),
c(strong_blue, strong_green, strong_red, strong_grey)
)

Highly Recommended candidates do less well; we get no Highly Recommended winners in 46% of simulations, and only one in another 47%. Remember that the model doesn’t think that being Highly Recommended helps more than just regular Recommended, and this year’s candidates have bad ballot position. Their performance this year will be a barometer for the power of the Bar’s endorsements; getting two winners (let alone three or four) would be a huge achievement (and presumably good for the citizens of Philadelphia, too).

DCC endorsees win an average of 3.1 of the six seats.

Of course, the true determinant is the first column.

View code
plot_winners(
col_sim, 
"Simulations by Column Position", 
c("First Column", "Second Column", "Third Column or later"),
c(strong_purple, strong_orange, strong_grey)
)

The first column produces 2.5 winners on average; the most likely outcome is three of the first-column candidates winning (45% of simulations), the second most likely is two (40%). The second column still produces 1.0 winners on average, with the remaining 2.4 winners coming from the final five columns.

View code
wincount_df <- simdf %>% 
group_by(sim) %>%
mutate(pvote= votes/sum(votes)) %>%
filter(vote_rank == 6)

How many votes will it take to win? The average sixth-place winner wins 5.1% of the vote (remember that candidates can vote for multiple candidates). Assuming that 218,000 people vote, and an average of 4.5 candidates selected per voter, that comes out to 50,000 votes.

View code
ggplot(
wincount_df,
aes(x = pvote * 100)
) +
geom_histogram(
aes(y=stat(count) / sum(stat(count))),
boundary=5,
binwidth=0.2,
fill = strong_green
) +
ylab("Proportion of Simulations") +
xlab("Percent of vote received by sixth place") +
geom_vline(xintercept = 100 * mean(wincount_df$pvote), color = "black") +
annotate(
"text", 
label = sprintf("Mean = %.1f %%", 100 * mean(wincount_df$pvote)),
x = 100 * mean(wincount_df$pvote),
y = 0.05,
angle = 90,
vjust = 1.1
)+ 
theme_sixtysix() +
ggtitle("Win Count for Common Pleas") 

This year may be… not too bad?

In 2015, three Not Recommended candidates became ten year judges. In 2017, three more did. This year, probably at worst only one will. Why? Mostly luck; all six of those unqualified winners were in the first column, and this year seven of the eight candidates in the first two columns are Recommended.

Instead, the open question is just how qualified our judges will be. Will we stay on par with the past, which would see my model’s predicted zero or only one Highly Recommended candidate win? Or will the Bar’s Highly Recommended ratings assert themselves, and prove a bigger player this year?

How many people will vote in the Primary?

One question I get a lot is what we should expect for turnout in this primary. We have a lot of mixed signals, and it can be hard to intuit what they all mean together.

The signals include:

  • This is a Primary with an incumbent mayor, which typically sees a low 140,000 voters.
  • But turnout has soared after 2016. It was was 66% higher for the 2017 primary than the prior three DA primaries, and 35% higher for the 2018 general than the prior three Gubernatorial generals.
  • The turnout surge was especially large in neighborhoods that voted for Krasner.
  • The Democratic Primary is currently at 29 Council At Large and 13 Commissioner candidates, versus 16 and 6 in 2015.
  • We have contested primaries in 5 of 10 council districts: 1, 2, 3, 4, and 7. In 2015, the only contested districts were Kanyatta Johnson’s 2nd and Maria Quiñones-Sánchez’s 7th.

What does it all mean? In this post, I’ll sort through the recent trends, and make a prediction (or really, two) for what turnout will look like.

View code
library(ggplot2)
library(dplyr)
library(ggthemes)
library(scales)
library(colorspace)
library(tidyr)
library(sf)


setwd("C:/Users/Jonathan Tannen/Dropbox/sixty_six/posts/turnout_2019_primary/")

load("../../data/processed_data/df_major_2017_12_01.Rda")

source("../../admin_scripts/util.R")

turnout <- df_major %>%
  filter(OFFICE %in% c(
    "PRESIDENT OF THE UNITED STATES",
    "GOVERNOR",
    "MAYOR",
    "DISTRICT ATTORNEY"
  )) %>%
  group_by(WARD16, DIV16, year, election) %>%
  summarise(VOTES = sum(VOTES))


df_2018 <- read.csv("../../data/raw_election_data/2018_general.csv")
names(df_2018)<- c(
  "WARD16", "DIV16", "TYPE", "OFFICE", 
  "CANDIDATE", "PARTY", "VOTES"
)
df_2018$WARD16 <- sprintf("%02d", df_2018$WARD16)
df_2018$DIV16 <- sprintf("%02d", df_2018$DIV16)

df_2018 <- df_2018 %>% 
  filter(OFFICE == "GOVERNOR AND LIEUTENANT GOVERNOR") %>%
  group_by(WARD16, DIV16) %>%
  summarise(VOTES = sum(VOTES)) 

df_2018$election <- "general"
df_2018$year <- "2018"

df_2018_primary <- read.csv("../../data/raw_election_data/2018_primary.csv")
# head(df_2018_primary)
names(df_2018_primary)<- c(
  "WARD16", "DIV16", "TYPE", "OFFICE", 
  "CANDIDATE", "PARTY", "VOTES"
)
df_2018_primary$WARD16 <- sprintf("%02d", df_2018_primary$WARD16)
df_2018_primary$DIV16 <- sprintf("%02d", df_2018_primary$DIV16)
df_2018_primary <- df_2018_primary %>% 
  filter(PARTY == "DEMOCRATIC") %>%
  mutate(OFFICE = gsub("(.*)-DEM", "\\1", OFFICE)) df_2018_primary <- df_2018_primary %>% filter(OFFICE == "GOVERNOR") %>% group_by(WARD16, DIV16) %>% summarise(VOTES = sum(VOTES)) df_2018_primary$election <- "primary" df_2018_primary$year <- "2018" turnout <- bind_rows(turnout, df_2018) turnout <- bind_rows(turnout, df_2018_primary) turnout_wide <- turnout %>% unite(key, election, year) %>% spread(key = key, value = VOTES) cycles <- data.frame( year = 2002:2021, cycle = rep(c("Governor","Mayor","President","District Attorney"), 5), senate = rep(c(FALSE, FALSE, TRUE, FALSE, TRUE, FALSE), 4)[1:20] ) turnout_total <- turnout %>% group_by(year, election) %>% summarise(VOTES = sum(VOTES)) turnout_total <- turnout_total %>% left_join( cycles %>% mutate(year = as.character(year)), by = "year" ) 

The Typical Turnout in an Incumbent Mayoral Primary

First, let’s consider the boring historical, pre-2016 baseline. Mayoral primaries have the second highest turnout in the city, second only to Presidential ones, but much lower turnout when there’s an incumbent mayor.

View code
annotation_df <- list(
  primary = tribble(
    ~year, ~VOTES, ~cycle, ~hjust,
    13, 75e3, "District Attorney", 0,
    14, 375e3, "President", 1,
    14, 180e3, "Governor", 0,
    14, 270e3, "Mayor", 0.5
  ),
  general = tribble(
    ~year, ~VOTES, ~cycle, ~hjust,
    13.5, 120e3, "District Attorney", 0,
    12, 675e3, "President", 0,
    14.2, 400e3, "Governor", 0,
    14.3, 248e3, "Mayor", 0
  )
)


senate_label_pos <- list(
  primary = tribble(
    ~year, ~VOTES, ~senate, ~label,
    2.3, 25e3, TRUE, "Senate",
    2.3, 50e3, FALSE, "Non-Senate"
  ),
  general = tribble(
    ~year, ~VOTES, ~senate, ~label,
    2.3, 37.5e3, TRUE, "Senate",
    2.3, 75e3, FALSE, "Non-Senate"
  )
)


turnout_plot <- function(use_election){
  ggplot(
    turnout_total %>% filter(election == use_election), 
    aes(
      x = year, 
      y = VOTES,
      color = cycle,
      group = interaction(cycle, election)
    )
  ) + 
    geom_point(size = 3, aes(shape = senate)) +
    geom_line() +
    expand_limits(y = 0) +
    scale_y_continuous("Votes Cast", labels = comma) +
    theme_sixtysix() +
    theme(axis.title.x = element_blank()) +
    geom_point(
      data = senate_label_pos[[use_election]],
      x = 2,
      aes(shape = senate),
      color = "grey20",
      group = NA,
      size = 3
    )+
    geom_text(
      data = senate_label_pos[[use_election]],
      aes(label=label),
      color = "grey20",
      group = NA,
      size = 4,
      hjust = 0
    )+
    geom_text(
      data = annotation_df[[use_election]],
      aes(label=cycle, hjust=hjust, color=cycle),
      group = NA,
      size = 4,
      fontface="bold"
    )+
    scale_shape_discrete(guide = FALSE)+
    scale_color_discrete(guide = FALSE)+
    ggtitle(sprintf(
      "Turnout in Philadelphia %s",
      ifelse(use_election == "general", "Generals", "Democratic Primaries")
    ))
}

In the 2011 primary, with Nutter running for reelection, 166,000 Philadelphians cast a vote. In 2003, a year in which Street ran unopposed in the primary but was divisive enough to draw a strong challenge in the general, 113,000 voted in the primary.

View code
turnout_plot("primary")

plot of chunk primary_turnout

We might start with a baseline guess of the average: 140,000 votes. We might, that is, if we hadn’t seen the last two years.

The post-2016 surge

Turnout since 2016 has fundamentally changed from the years before. In the plot above, notice that 165,000 Philadelphians voted in the 2017 District Attorney primary, 2.6 times the turnout of four years before (and 1.7 times the average turnout of the prior three DA primaries). Then the 2018 general turnout was astromical, approaching Presidential election numbers. The 554,000 votes cast was 36% higher than the 409,000 average of the four prior Gubernatorial generals.

View code
turnout_plot("general")

plot of chunk general_turnout

That turnout surge wasn’t uniform, but disproportionately occured in the gentrifying ring around Center City: University City, South Philly, and the River Wards (which I’ll call Krasner’s Base, as we’ll see later). Turnout was 3x the typical turnout for a DA election in those wards in 2017, and 2x the typical Gubernatorial turnout in 2018.

View code
da_results <- df_major %>%
  filter(
    election == "primary" & PARTY == "DEMOCRATIC" & OFFICE == "DISTRICT ATTORNEY"
  ) %>%
  group_by(year, WARD16, DIV16, CANDIDATE) %>%
  summarise(VOTES = sum(VOTES)) %>%
  group_by(year, WARD16, DIV16) %>%
  mutate(
    total_votes = sum(VOTES),
    pct_vote = VOTES / total_votes
  )

da_results$candidate_name <- format_name(da_results$CANDIDATE)

turnout_wide <- turnout_wide %>%
  mutate(
    typical_turnout_da = (primary_2013 + primary_2009 + primary_2005)/3,
    typical_turnout_governor = (general_2014 + general_2010 + general_2006 + general_2002)/4
  )

krasner_results <- da_results %>% 
  filter(candidate_name == "Lawrence S Krasner") %>%
  left_join(turnout_wide)
View code
library(sf)

divs <- st_read("../../data/gis/2016/2016_Ward_Divisions.shp", quiet = TRUE)
divs <- divs %>% st_transform(2272)
wards <- st_read("../../data/gis/2016/2016_Wards.shp",  quiet = TRUE)
wards <- wards %>% st_transform(2272)

divs$area <- as.numeric(st_area(divs$geometry)) / (5280^2)
wards$area <- wards$AREA_SFT / (5280^2)

divs <- st_simplify(divs, 500)
divs <- divs %>% mutate(
  WARD16 = sprintf("%02d", WARD),
  DIV16 = sprintf("%02d", DIVSN)
) %>% select(-WARD, -DIVSN)
wards$WARD16 = sprintf("%02d", asnum(wards$WARD))
View code
krasner_results_wards <- krasner_results %>%
  group_by(WARD16) %>%
  summarise(
    turnout_2017 = sum(primary_2017),
    typical_turnout_da = sum(typical_turnout_da),
    pct_vote = weighted.mean(pct_vote, w=total_votes)
  )

turnout_wide_wards <- turnout_wide %>%
  group_by(WARD16) %>%
  summarise_at(
    .funs = funs(sum(., na.rm = TRUE)), 
    vars(
      starts_with("primary_"), 
      starts_with("general_"), 
      starts_with("typical_turnout_")
    )
  )

krasner_turnout_per_mile <- ggplot(
  # divs %>%
  #   left_join(turnout_wide)
  wards %>% 
    left_join(turnout_wide_wards)
) +
  geom_sf(
    aes(
      fill = pmin(primary_2017 / area, 10e3)
    ), 
    color = NA
  ) +
  scale_fill_viridis_c(
    "Votes per mile",
    labels = scales::comma
  ) +
  theme_map_sixtysix() +
  ggtitle("Votes per mile in the 2017 primary")

krasner_turnout_change <- ggplot(
  # divs %>%
  #   left_join(turnout_wide)
  wards %>%
    left_join(turnout_wide_wards)
) +
  geom_sf(
    aes(fill = pmin(primary_2017 / typical_turnout_da, 4)), 
    color = NA
  ) +
  expand_limits(fill = c(1,3)) +
  scale_fill_viridis_c(
    "Turnout in 2017/\n Typical DA Turnout",
    # breaks = 0:5,
    labels = function(x) paste0(x, "x")
  ) +
  theme_map_sixtysix() +
  ggtitle("Surged nearly 3x in Krasner's base")

gridExtra::grid.arrange(
  krasner_turnout_per_mile,
  krasner_turnout_change, 
  nrow=1  
)

plot of chunk krasner_turnout_maps

View code
votes_per_mile_2018 <- ggplot(
  # divs %>%
    # left_join(turnout_wide)
    wards %>% left_join(turnout_wide_wards)
) +
  geom_sf(
    aes(
      fill = pmin(general_2018 / area, 20e3)
    ), 
    color = NA
  ) +
  scale_fill_viridis_c(
    "Votes per mile",
    labels = scales::comma
  ) +
  theme_map_sixtysix() +
  ggtitle("Votes per mile in the 2018 general")

turnout_change_2018 <- ggplot(
  # divs %>%
  #   left_join(turnout_wide)
  wards %>%
    left_join(turnout_wide_wards)
) +
  geom_sf(
    aes(fill = pmin(general_2018 / typical_turnout_governor, 4)), 
    color = NA
  ) +
  expand_limits(fill = c(1, 3)) +
  scale_fill_viridis_c(
    "Turnout in 2018/\n Typical Governor Turnout",
    labels = function(x) paste0(x, "x")
  ) +
  theme_map_sixtysix() +
  ggtitle(
    'Surged "only" 2x in Krasner\'s base'
  )

gridExtra::grid.arrange(
  votes_per_mile_2018,
  turnout_change_2018, 
  nrow=1  
)

plot of chunk turnout_maps_2018

Why do I label those wards as Krasner’s base? Because they’re exactly where Krasner did strongest, winning over 60% of the votes (in a multi-candidate race!). Here’s the map of the District Attorney’s votes (mapped by Division).

View code
krasner_pct_map <- ggplot(
  divs %>% left_join(krasner_results)
  # wards %>% 
  #   left_join(krasner_results_wards)
) +
  geom_sf(aes(fill = 100 * pct_vote), color = NA) +
  scale_fill_viridis_c("Percent\n of vote") +
  theme_map_sixtysix() +
  ggtitle("Percent of vote for Krasner")

print(krasner_pct_map)

plot of chunk krasner_results

The story is clear: there is a specific population that used to never vote that’s been activated by 2016. These are the predominantly young, predominantly White residents of rapidly gentrifying wards. The votes in those neighborhoods are converging to the high-turnout behaviors regularly seen in core Center City and the Northwest.

Plotting the increase in turnout in 2017 versus Krasner’s percent of the vote shows that divisions everywhere voted at least 1.5x as much as the prior three DA races, but over twice as much where Krasner won more than 50% of the vote.

View code
ggplot(
  krasner_results,
  aes(
    y = primary_2017 / typical_turnout_da,
    x = 100 *pct_vote
  )
) +
  geom_point(
    aes(
      size = typical_turnout_da
    ),
    alpha = 0.3,
    color = strong_green,
    pch = 16
  ) +
  geom_smooth(
    aes(weight = typical_turnout_da), 
    color = "grey10"
  )+
  scale_size_area("Division's average votes\nin 2005, 2009, 2013") +
  scale_y_continuous(
    labels = function(x) return(paste0(x, "x")),
    limits = c(0, 8),
    breaks = seq(0, 10, 2)
  ) +
  geom_hline(yintercept = 1, linetype="dashed") +
  annotate(
    "text", 
    x=55, y=0.95,
    label="2017 turnout = typical turnout",
    hjust=0, vjust=1
  ) +
  labs(
    title = "Krasner's popularity also drove turnout",
    subtitle = "Democratic Primary turnout was 1.5x in low-support divisions, but 3x in high",
    y = "Votes in 2017 / Average votes in 2005, 2009, 2013",
    x = "Percent of vote for Krasner"
  )+
  theme_sixtysix()

plot of chunk krasner_scatter

District Council Races

Finally, what do the competitive Council races imply?

I don’t know what to do with the increase in candidates for At Large races, which clearly represents something but is such an outlier that there’s no responsible way to use it. But the increase in competitive district races will have a clear impact on the election, which we can measure.

View code
load_council_races <- function(year){
  df_year <- read.csv(
    paste0("../../data/raw_election_data/",year,"_primary.csv")
  )

  df_year <- df_year %>%
    mutate(
      WARD16 = sprintf("%02d", asnum(WARD)),
      DIV16 = sprintf("%02d", asnum(DIVISION))
    ) %>%
    group_by(WARD16, DIV16, OFFICE, CANDIDATE, PARTY) %>%
    summarise(VOTES = sum(VOTES)) %>%
    group_by()


  district_regex <- "DISTRICT COUNCIL(-|\\s)([0-9]+)[A-Z]+ DIST(RICT)?-D(EM)?"
  council_districts <- df_year %>%
    filter(grepl(district_regex, OFFICE)) %>%
    mutate(
      district = asnum(gsub(district_regex, "\\2", OFFICE))
    ) %>%
    mutate(
      candidate_name = format_name(CANDIDATE),
      last_name = get_last_name(candidate_name),
      year = year
    )
  return(council_districts)
}

council_2015 <- load_council_races(2015)
council_2011 <- load_council_races(2011)
council_2007 <- load_council_races(2007)
council_2003 <- load_council_races(2003)

council_df <- bind_rows(
  council_2015, council_2011, council_2007, council_2003
)

council_totals <- council_df %>%
  group_by(year, candidate_name, district) %>%
  summarise(votes = sum(VOTES)) %>%
  arrange(year, district, desc(votes))

council_races <- council_totals %>%
  group_by(year, district) %>%
  summarise(
    winner = candidate_name[which.max(votes)],
    pct_winner = max(votes) / sum(votes),
    is_competitive = pct_winner < 0.9
  ) %>%
  group_by()

council_turnout <- turnout %>% 
  rename(total_votes = VOTES) %>%
  group_by() %>%
  filter(
    election == "primary" &
      year %in% seq(2003, 2015, 4)
  ) %>%
  mutate(year = asnum(year)) %>%
  left_join(
    council_df %>%
      select(WARD16, DIV16, district) %>%
      unique()
  ) %>%
  left_join(council_races)

## FYI: 2015 did not have a Dem Primary in the 10th

council_turnout$is_competitive <- with(
  council_turnout,
  replace(is_competitive, is.na(is_competitive), FALSE)
)

council_turnout <- council_turnout %>%
  mutate(WARD_DIVSN = paste0(WARD16, DIV16))

council_turnout <- council_turnout %>%
  mutate(competitive_mayor = year %in% c(2015, 2007))


fit_competitive <- lm(
  log(total_votes + 1) ~ 
    as.character(year) +
    WARD_DIVSN +
    is_competitive * competitive_mayor,
  data = council_turnout
) 

coef_council_is_competitiveTRUE <- coef(fit_competitive)['is_competitiveTRUE']

coef_council_mayor_is_competitive_interaction <- coef(fit_competitive)['is_competitiveTRUE:competitive_mayorTRUE']

# ncoef <- length(fit_competitive$coefficients)
# fit_competitive$coefficients %>% tail(4)
# vcov <-vcov(fit_competitive)
# vcov[(nrow(vcov)-2):nrow(vcov), (nrow(vcov)-2):nrow(vcov)] %>% diag %>% sqrt

In an incumbent Mayoral election, the competitive Council districts have turnout 15.3% higher than the non-competitive districts (this estimate includes year and division fixed effects, to control for divisions’ individual turnouts and overall annual swings). Competitive districts only have 3.8% higher turnout when the Mayor’s seat is open, since everyone votes anyway.

Tying it all together

What does this all mean? I’ll make two projections: Low, the pre-2016 typical turnout; and High, assuming the post-2018 surge continues.

Low:

  • Each division’s average turnout for incumbent Mayoral Elections (2003 and 2011)
  • with the 1, 2, 3, 4, and 7th Council Districts contested
  • using typical pre-2016 turnout.

High:

  • Each division’s turnout for only 2011 (since 2003 was distinctly low)
  • with the 1, 2, 3, 4, and 7th Council Districts contested
  • using each division’s proportional surge in 2018.

I’ll only use the 2018 proportional surge (and not the higher 2017 primary surge) because a mayoral race has baseline turnout more similar to a gubernatorial general than a DA primary, so the DA’s race just had much more room for turnout to grow.

For the high-turnout projection I only use 2011 as the baseline, because 2003 had particularly low turnout even for an uncontested primary. This was probably due to relative discontent with incumbent Street, the same sentiment that led to Republican Katz’s strong (but still not close) performance in the general.

View code
projected_turnout <- turnout_wide %>%
  left_join(
    council_df %>%
      filter(year == 2015) %>%
      select(WARD16, DIV16, district) %>%
      unique()
  ) 

projected_turnout <- projected_turnout %>%
  mutate(
    is_contested_2019 = district %in% c(1,2,3,4,7)
  ) %>%
  left_join(
    council_races %>%
      select(year, district, is_competitive) %>%
      filter(year %in% c(2011, 2003)) %>%
      mutate(year = paste0("was_contested_", year)) %>%
      spread(year, is_competitive, fill=FALSE)
  ) 

projected_turnout <- projected_turnout %>%
  mutate(
    baseline_turnout_avg = 0.5 * (
      primary_2003 / ifelse(
        was_contested_2003, exp(coef_council_is_competitiveTRUE), 1 
      ) + 
        primary_2011 / ifelse(
          was_contested_2011, exp(coef_council_is_competitiveTRUE), 1
        )
    ),
    baseline_turnout_2011 =
      primary_2011 / ifelse(
        was_contested_2011, exp(coef_council_is_competitiveTRUE), 1
      )
  )

projected_turnout <- projected_turnout %>%
  mutate(
    competitive_scaling = ifelse(
      is_contested_2019, exp(coef_council_is_competitiveTRUE), 1
    )
  )

projected_turnout <- projected_turnout %>%
  left_join(
    turnout_wide %>% 
      select(
        WARD16, DIV16, primary_2017, 
        general_2018, typical_turnout_da, typical_turnout_governor
      ) %>%
      mutate(
        scale_2017 = primary_2017 / typical_turnout_da,
        scale_2018 = general_2018 / typical_turnout_governor
      )
  ) %>%
  mutate(
    high_projection = baseline_turnout_2011 *
      competitive_scaling *
      # (scale_2018 + scale_2017)/2
      scale_2018,
    low_projection = baseline_turnout_avg *
      competitive_scaling
  )

baseline_turnout <- sum(
  projected_turnout$low_projection,
  na.rm = TRUE
)
surged_turnout <- sum(
  projected_turnout$high_projection, 
  na.rm = TRUE
)

turnout_2019 <- tribble(
  ~year, ~election, ~cycle, ~senate,
  '2019', "primary", "Mayor", FALSE
) %>% full_join(
  data.frame(
    year = '2019',
    sim = c("Low", "High"),
    VOTES = c(baseline_turnout, surged_turnout)
  )
)

turnout_plot("primary") +
  geom_point(
    data=turnout_2019,
    size=3
  ) +
  geom_segment(
    data=turnout_2019,
    aes(
      xend=year,
      yend=VOTES,
      color=cycle
    ),
    x=14,
    y=turnout_total %>% 
      filter(year == 2015 & election == "primary") %>% 
      with(VOTES),
    linetype="dashed"
  ) +
  geom_text(
    data=turnout_2019,
    aes(label = sim),
    vjust = -1
  ) +
  labs(subtitle = "Projections under pre- and post-2016 assumptions.")

plot of chunk projections,

Under typical, pre-2016 assumptions, we would expect 140,000 votes. A surge proportional to the 2018 general would lead to 218,000 votes. Both of these are lower than 2015’s 247,000, because it’s just so hard to match the energy of a competitive Mayoral race, even post-2016. The surging divisions of Krasner’s Base will probably come out strong, but their turnout numbers aren’t strong enough to overcome the presumably typical turnout in the rest of the city.

If you pinned me down, my guess is turnout will be somewhere closer to the High projection. I doubt we’ll reachieve the surge of the 2018 general because that was fueled by huge national attention, and the first post-Trump national race. But there’s certainly something different from 2011, just given the number of candidates alone.

You can download the division-level projections from github. (NOTE: the division-level projections are super noisy, and a few divisions have missing values because of boundaries that don’t line up with any available boundary file (in the 5th Ward). The ward-level sums should be largely right, but read the individual divisions with some caution.)

View code
write.csv(
  projected_turnout %>% 
    select(
      WARD16, DIV16, district,
      high_projection, low_projection, 
      primary_2011, primary_2015, 
      typical_turnout_governor, general_2018, 
      typical_turnout_da, primary_2017
    ),
  file = "turnout_projections_2019.csv",
  row.names=FALSE
)

Will this happen? We’ll find out when the Turnout Tracker returns on election day!

What’s a Ward Endorsement worth?

In 2017, about 6 points

With ballot positions decided, candidates are starting to vie for coveted ward endorsements. How many votes are they really worth?

Two years ago, I did a simplistic analysis of the Court of Common Pleas, where I found that judicial candidates received 0.9 more percent of the vote in wards where they were endorsed. In a race where candidates win with 4.3 percent of the vote, that effect is huge (and larger than even ballot position).

There were a number of caveats to that analysis: I only had endorsements from a few, systematically different wards, and I didn’t do anything to identify causality–we know that candidates do better in wards where they are endorsed, but we don’t know if the endorsements cause that increase, or if the ward leaders were endorsing candidates who would have done well there anyway.

Let’s do better.

In 2017, Max Marin at Philadelphia Weekly undertook the herculean effort of tracking down endorsements in 62 of Philadelphia’s 66 wards. Let’s use that, and do some spatial econometrics.

View code
library(tidyverse)
library(sp)
library(rgeos)
library(rgdal)
library(sf)
source("../../admin_scripts/util.R")

df_major <- safe_load("../../data/processed_data/df_major_2017_12_01.Rda")

df_major$WARD_DIVSN <- with(df_major, paste0(WARD16, DIV16))

df_major <- df_major %>%
  filter(
    election == "primary" & CANDIDATE != "Write In" & PARTY == "DEMOCRATIC"
  )
df_major <- df_major %>%
  group_by(WARD_DIVSN, OFFICE, year) %>%
  mutate(pct_vote = VOTES / sum(VOTES))

df_major <- df_major %>%
  filter(OFFICE %in% c("COUNCIL AT LARGE", "DISTRICT ATTORNEY"))


bg_17_acs <- read.csv("../../data/census/acs_2013_2017_phila_bg_race_income.csv")
bg_17_acs <- bg_17_acs %>% 
  mutate(Geo_FIPS = as.character(Geo_FIPS)) %>%
  select(
    Geo_FIPS, pop, pop_nh_white, pop_nh_black, pop_nh_asian, pop_hisp, pop_median_income_2017
  )

sp_divs <- readOGR("../../data/gis/2016/2016_Ward_Divisions.shp", verbose = FALSE)
sp_divs <- spChFIDs(sp_divs, as.character(sp_divs$WARD_DIVSN))
sp_divs <- spTransform(sp_divs, CRS("+init=EPSG:4326"))

library(tigris)
options(tigris_use_cache = TRUE)
bg_shp <- block_groups(42, 101, year = 2015)
bg_shp <- spChFIDs(bg_shp, as.character(bg_shp$GEOID))
bg_shp <- spTransform(bg_shp, CRS(proj4string(sp_divs)))

sp_divs$bg <- over(
  gCentroid(sp_divs, byid = TRUE), 
  bg_shp
)$GEOID

sp_divs@data <- sp_divs@data %>%
  left_join(bg_17_acs, by = c("bg"="Geo_FIPS"))

df_major <- df_major %>%
  left_join(sp_divs@data) %>%
  mutate(
    pct_wht = pop_nh_white / pop,
    pct_blk = pop_nh_black / pop,
    pct_asian = pop_nh_asian/ pop,
    pct_hisp = pop_hisp / pop
  )

In the 2017 DA race, no single candidate monopolized the endorsements; O’Neill led the way with 11 endorsements, largely in the Northeast.

View code
endorsements <- read_csv("da_2017_endorsements.csv")
endorsements$ward <- sprintf("%02d", endorsements$ward)

da_results <- df_major %>% 
  filter(election == "primary" & year == 2017 & OFFICE == "DISTRICT ATTORNEY") %>%
  mutate(
    last_name = gsub(
      "^.*\\s([A-Z])([A-Z]+)
View code
quot;, "\\U\\1\\L\\2", CANDIDATE, perl = TRUE ) ) %>% group_by(WARD_DIVSN) %>% mutate(total_votes = sum(VOTES)) %>% group_by() %>% mutate(pvote = VOTES / total_votes) da_results$last_name <- with( da_results, ifelse( last_name == "Neill", "O'Neill", ifelse(last_name == "Shabazz", "El-Shabazz", last_name) ) ) da_results %>% group_by(WARD16, last_name) %>% summarise(votes = sum(VOTES)) %>% group_by(WARD16) %>% mutate( ward_votes = sum(votes), pvote = votes/ ward_votes ) %>% left_join( endorsements %>% mutate(is_endorsed = TRUE), by = c("WARD16" = "ward", "last_name" = "endorsement") ) %>% mutate( is_endorsed = replace(is_endorsed, is.na(is_endorsed), FALSE) ) %>% group_by(last_name, is_endorsed) %>% summarise( pct_vote = 100 * weighted.mean(pvote, w = ward_votes), total_votes = sum(ward_votes), n_wards = n() ) %>% group_by(last_name) %>% summarise( pct_vote_overall = weighted.mean(pct_vote, w = total_votes), wards_endorsed = ifelse(any(is_endorsed), n_wards[is_endorsed], 0), turnout_endorsed = ifelse(any(is_endorsed), total_votes[is_endorsed], 0), pct_vote_notendorsed = pct_vote[!is_endorsed], pct_vote_endorsed = ifelse(any(is_endorsed), pct_vote[is_endorsed], NA) ) %>% arrange(desc(pct_vote_overall)) %>% knitr::kable( digits = 0, format = "html", format.args = list(big.mark = ","), col.names = c( "Candidate", "Citywide % of vote", "Number of ward endorsements", "Turnout in endorsed wards", "% of vote in un-endorsed wards", "% of vote in endorsed wards" ) ) 

 

Candidate Citywide % of vote Number of ward endorsements Turnout in endorsed wards % of vote in un-endorsed wards % of vote in endorsed wards
Krasner 38 9 28,700 36 46
Khan 20 8 23,485 18 32
Negrin 14 10 20,794 13 21
El-Shabazz 12 7 19,504 11 17
Untermeyer 8 9 16,200 7 17
O’Neill 6 11 21,520 4 20
Deni 2 0 0 2 NA

Krasner won by over 5,000 votes (18%), despite receiving the typical number of ward endorsements. The endorsements that he did receive came from wards with the highest turnout, but part of that is reverse causality: the places that he energized turned out big.

Naively, candidates did about 11 percentage points better in wards where they were endorsed than in wards where they weren’t. BUT. This suffers from the same lack of causal identification as the Judicial Analysis above: we don’t know if they did better because of the endorsements, or if they were just endorsed in wards where they would have done well anyway.

How can we do better? Let’s use something I noticed in last week’s post on District 7: the strength of boundaries.

The strongest ward endorsements can have visible effects in divisions just across the street from each other.

View code
wards <- readOGR("../../data/gis/2016","2016_Wards", verbose=FALSE) %>%
  spTransform(CRS(proj4string(sp_divs))) 

ggwards <- fortify(spChFIDs(wards, sprintf("%02d", wards$WARD)))

bbox <- sp_divs[substr(sp_divs$WARD_DIVSN,1,2) %in% c("10", "50", "09"),] %>% 
  gUnionCascaded() %>% 
  bbox()


bbox <- rowMeans(bbox) + 1.2 * sweep(bbox, 1, rowMeans(bbox))

polygon_in_bbox <- function(p) {
  coords <- p@Polygons[[1]]@coords
  any(
    coords[,1] > bbox[1,1] &
      coords[,1] < bbox[1,2] &
      coords[,2] > bbox[2,1] &
      coords[,2] < bbox[2,2] 
  )
}
sp_divs$in_bbox <- sapply(sp_divs@polygons, polygon_in_bbox) 

ggdivs <- fortify(spChFIDs(sp_divs, as.character(sp_divs$WARD_DIVSN)))

ggdivs <- ggdivs %>%
  left_join(
    sp_divs@data %>% select(WARD_DIVSN, in_bbox),
    by = c("id" = "WARD_DIVSN")
  ) %>%
  left_join(
    da_results %>% filter(last_name %in% c("Khan", "Krasner", "El-Shabazz")),
    by = c("id" = "WARD_DIVSN")
  )

ward_centroids <- gCentroid(wards, byid=TRUE) %>% as.data.frame()
ward_centroids$ward <- wards$WARD

ggplot(
  ggdivs %>% filter(in_bbox),
  aes(x=long, y=lat)
) +
  geom_polygon(aes(fill = 100 * pvote, group=group), color = NA) +
  geom_polygon(data = ggwards, aes(group=group), fill = NA, color = "white") +
  geom_text(data = ward_centroids, aes(x=x, y=y, label=ward), color = "white") +
  facet_wrap(~last_name) +
  scale_fill_viridis_c("% of vote") +
  theme_map_sixtysix() +
  coord_map(xlim=bbox[1,], ylim=bbox[2,]) +
  theme(
    legend.position = "bottom",
    legend.direction = "horizontal"
  ) +
  ggtitle("Percent of the Vote in Northwest Wards", "2017 DA Race")

plot of chunk map
Wards 10 and 50 endorsed Krasner, Ward 9 endorsed Khan, and Ward 22 endorsed El-Shabazz. You can immediately see the strength of 10 and 50’s endorsements: Krasner did better in divisions inside the boundary of 10 and 50 than he did just across the street. Same for 9, maybe, where Khan did well. And El-Shabazz did better in 22, though there isn’t an obvious boundary effect.

I’ll use this intuition to measure the effect across all boundaries in the whole city. To isolate the causal effect of the wards, I’ll limit the analysis to only compare divisions that are across the street from each other but happen to be divided by a ward boundary, and where different candidates were endorsed. This will ensure that we’re comparing divisions apples-to-apples, where the only thing that’s different is the ward endorsement.

I’ll go one step farther, and control for the census demographics of the block groups that the division sits in, in case a ward boundary happens to also serve as an emergent boundary (dissertation plug). I measure how each candidate’s vote correlated with the race and ethnicity of the neighborhood and subtract that out, leaving a measure of how much better or worse that candidate did than expected. It’s that “residual” that I will compare across boundaries.

View code
da_fit <- lm(
  pvote ~
    CANDIDATE * pct_wht + 
    CANDIDATE * pct_blk + 
    CANDIDATE * pct_hisp,
    # CANDIDATE * log(pop_median_income_2017),
  data = da_results
)

da_results$predicted <- predict(da_fit, newdata = da_results)
da_results$resid <- with(da_results, pvote - predicted)

neighbors <- st_intersection(st_as_sf(sp_divs), st_as_sf(sp_divs))
neighbors <- neighbors %>%
  filter(WARD != WARD.1)

neighbors <- neighbors %>%
  mutate(geometry_type = st_geometry_type(geometry)) %>%
  filter(!geometry_type %in% c("POINT", "MULTIPOINT"))


neighbors <- neighbors %>% 
  mutate(
    WARD.0 = sprintf("%02d", asnum(WARD)),
    WARD.1 = sprintf("%02d", asnum(WARD.1))
  ) %>%
  left_join(
    endorsements %>% rename(endorsement.0 = endorsement),
    by = c("WARD.0" = "ward")
  ) %>%
  left_join(
    endorsements %>% rename(endorsement.1 = endorsement),
    by = c("WARD.1" = "ward")
  ) 

neighbors <- neighbors %>% 
  left_join(
    da_results %>% 
      select(WARD_DIVSN, last_name, total_votes, pvote, resid) %>%
      rename(total_votes.0 = total_votes, pvote.0 = pvote, resid.0 = resid),
    by = c("WARD_DIVSN" = "WARD_DIVSN", "endorsement.0" = "last_name")
  ) %>% 
  left_join(
    da_results %>% 
      select(WARD_DIVSN, last_name, total_votes, pvote, resid) %>%
      rename(total_votes.1 = total_votes, pvote.1 = pvote, resid.1 = resid),
    by = c("WARD_DIVSN.1" = "WARD_DIVSN", "endorsement.0" = "last_name")
  )

To correctly measure wards’ individual strength, I fit a random effects model, which simultaneously estimates the average effect of all wards’ endorsements and how much each individual ward varies from that.

View code
library(lme4)

df0 <- neighbors %>% filter(endorsement.0 != endorsement.1)

fit_lmer <- function(neighbor_df){
  re_fit <- lmer(
    resid.0 - resid.1 ~ (1 | WARD.0),
    data = neighbor_df,
    weights = neighbor_df %>%
      with(pmin(total_votes.0, total_votes.1))
  )

  re <- ranef(re_fit)$WARD.0 
  re <- re %>%
    mutate(
      ward = row.names(re),
      effect = re_fit@beta + `(Intercept)`
    )

  return(
    list(
      fit = re_fit,
      re = re
    )
  )
}

fit_baseline <- fit_lmer(df0)

n_boot <- 200
bs_list <- vector(mode = "list", length = n_boot)
for(b in 1:n_boot){

  sample_divs = sample(unique(df0$WARD_DIVSN), replace = TRUE)

  #if(b %% floor(n_boot / 10) == 0) print(b)
  df_samp <- data.frame(WARD_DIVSN = sample_divs) %>% left_join(df0)
  bs_fit <- fit_lmer(df_samp)
  bs_list[[b]] <- bs_fit
}

fixef_ci <- quantile(
  sapply(bs_list, function(x) fixef(x$fit)),
  c(0.025, 0.975)
)
cat(paste0(
  "Average Effect of a Ward Endorsement:\n",
  sprintf(
    "%0.1f (%0.1f, %0.1f)",
    fixef(fit_baseline$fit)["(Intercept)"] * 100,
    fixef_ci[1] * 100,
    fixef_ci[2] * 100
  )
))
## Average Effect of a Ward Endorsement:
## 5.8 (5.0, 6.9)

The average Ward endorsement was worth 5.8 percentage points in the 2017 DA race. This is about half of the 11 percentage point gap we saw in the naive analysis above; it turns out the other half was because of wards endorsing candidates that the voters already supported.

But some wards are much more important than others.

How does each ward’s endorsement stack up? The table below sorts the wards by order of the vote effect, which is the percentage effect of the endorsement times the 2017 primary turnout.

View code
ranef_ci <- bind_rows(
  lapply(bs_list, function(x) x$re), 
  .id = "sim"
) %>%
  group_by(ward) %>%
  summarise(
    p025 = quantile(effect, 0.025),
    p975 = quantile(effect, 0.975)
  )

fit_baseline$re %>% 
  select(ward, effect) %>% 
  left_join(ranef_ci) %>%
  mutate(
    ci = sprintf("(%0.1f, %0.1f)", 100 * p025, 100*p975)
  ) %>%
  left_join(
    da_results %>% 
      group_by(WARD16, last_name) %>%
      summarise(
        pvote = 100 * weighted.mean(pvote, w = total_votes),
        total_votes = sum(total_votes)
      ) %>%
      inner_join(
        endorsements, 
        by = c("WARD16" = "ward", "last_name" = "endorsement")
      ),
    by = c("ward" = "WARD16")
  ) %>%
  mutate(
    pvote = round(pvote, 0),
    effect = round(100 * effect, 0),
    vote_effect = round(effect/100 * total_votes)
  ) %>%
  rename(endorsement = last_name) %>%
  select(ward, endorsement, pvote, effect, ci, total_votes, vote_effect) %>%
  arrange(desc(vote_effect)) %>% 
  DT::datatable(
    rownames=FALSE,
    colnames = c("Ward", "Endorsee", "% of Vote in Ward", "Endorsement Effect at Boundary","CI", "Ward Votes", "Vote Effect of Endorsement")
  )
Ward Endorsee % of Vote in Ward Endorsement Effect at Boundary CI Ward Votes Vote Effect of Endorsement
10 Krasner 50 14 (8.0, 19.5) 3,719 521
09 Khan 37 12 (6.7, 17.5) 4,264 512
30 Khan 39 14 (8.5, 17.9) 3,403 476
52 Negrin 17 10 (1.1, 20.0) 3,768 377
36 Negrin 18 9 (4.8, 13.6) 3,932 354
56 Untermeyer 26 14 (9.9, 19.5) 2,346 328
50 Krasner 56 6 (-0.7, 15.6) 5,094 306
61 El-Shabazz 24 12 (4.7, 19.3) 2,547 306
40 O’Neill 15 8 (4.8, 14.3) 3,591 287
42 Krasner 45 20 (14.6, 25.2) 1,270 254
38 Negrin 29 10 (3.3, 21.0) 2,507 251
01 O’Neill 13 7 (3.7, 9.1) 2,954 207
03 Untermeyer 22 8 (3.0, 14.3) 2,312 185
63 O’Neill 24 8 (2.4, 16.0) 1,920 154
19 Negrin 50 26 (16.4, 40.6) 589 153
57 O’Neill 26 8 (4.5, 13.0) 1,719 138
65 O’Neill 26 8 (5.9, 11.0) 1,644 132
23 O’Neill 20 10 (5.2, 18.7) 1,284 128
05 Khan 30 2 (-2.5, 7.2) 5,927 119
60 Negrin 15 5 (2.0, 8.2) 2,350 118
07 Negrin 55 21 (6.2, 36.9) 548 115
21 Khan 34 2 (-3.3, 8.7) 5,383 108
31 Khan 23 5 (0.4, 9.0) 2,076 104
24 Untermeyer 11 7 (4.4, 11.0) 1,437 101
51 Untermeyer 13 4 (1.8, 9.8) 2,386 95
46 El-Shabazz 9 2 (-0.9, 5.3) 4,515 90
16 Untermeyer 21 9 (1.7, 13.0) 965 87
12 Krasner 42 3 (-0.7, 7.8) 2,627 79
27 Krasner 70 4 (-3.2, 11.4) 1,978 79
41 Khan 24 7 (3.9, 11.0) 975 68
48 Untermeyer 13 4 (1.8, 7.8) 1,623 65
64 O’Neill 22 7 (-0.8, 13.8) 795 56
58 Untermeyer 15 2 (-6.2, 7.2) 2,606 52
14 Negrin 12 5 (2.8, 7.9) 986 49
34 Krasner 34 1 (-3.6, 5.2) 4,900 49
06 Krasner 47 3 (-10.0, 10.1) 1,605 48
39 O’Neill 25 1 (-1.3, 5.9) 4,462 45
43 Negrin 23 4 (-0.1, 8.4) 1,091 44
25 Untermeyer 14 5 (-1.5, 11.7) 801 40
55 O’Neill 18 3 (-2.8, 7.9) 1,305 39
54 O’Neill 11 4 (0.8, 6.5) 720 29
62 O’Neill 23 2 (-1.8, 5.8) 1,126 23
45 Khan 22 2 (-1.6, 5.5) 891 18
32 El-Shabazz 27 1 (-3.3, 6.7) 1,727 17
33 Khan 26 3 (-1.4, 9.3) 566 17
35 Untermeyer 11 1 (-1.9, 5.6) 1,724 17
04 El-Shabazz 31 0 (-3.1, 4.5) 2,110 0
44 Krasner 37 0 (-5.4, 7.7) 1,416 0
47 El-Shabazz 15 -1 (-5.0, 5.9) 749 -7
29 Negrin 23 -2 (-5.5, 4.2) 1,331 -27
15 Negrin 18 -1 (-5.1, 3.4) 3,692 -37
49 El-Shabazz 19 -3 (-7.1, 0.3) 2,348 -70
22 El-Shabazz 13 -2 (-4.5, 1.3) 5,508 -110
08 Krasner 42 -3 (-6.8, 0.6) 6,091 -183

The most important ward in 2017 was Ward 10, which gave Krasner a 14 percentage point boost on a turnout of 3,719, meaning an estimated bump of 507 votes. (The exact order of the rankings has a lot of uncertainty. Don’t take them as gospel.) Those three Northwest wards we looked at above, 10, 9 and 50, were all in the top seven, with 10 and 9 making up first and second place, largely on the back of their high turnout.

What this means for May

This analysis is specific to the 2017 DA race in a number of ways. I expect ward endorsements to have more importance in low-information races, and all of the races this time around–City Council At Large, Judicial, and Commissioner–will be lower-information than the 2017 DA.

Consider the simplistic analysis I did for the 2017 Court of Common Please. That analysis found that endorsed candidates performed 0.9 percentage points better, in a race that took 4.3% of the vote to win. That estimate is the analog to the 11 point DA effect in the first table. We found that half of the 11 DA points was actually causal, so 0.45 points is a naive guess of the effect in judicial races.

But there are two more changes. First, taking half of the effect is almost certainly too conservative for judges. There are few pre-existing preferences among voters, so much less of that correlation will be “wards endorsing candidates that are already popular”. The causal part will be higher.

But second, the wards that I had data for in that analysis are all the wards with the strongest endorsement effect in this one: 9, 30, 52, and 50 were all among the 18 wards I had data for. So that estimate might be higher, too, than if we had data for every ward.

We end up in between. The Ward endorsements–especially in the top wards on the chart–are effective but not decisive. They are powerful enough that they likely decide close judicial races, but not enough to have changed 2017’s DA race.

Appendix: Ward Map

View code
ggplot(ggwards, aes(x = long, y=lat)) +
  geom_polygon(aes(group=group), fill = strong_green, color = "white") +
  geom_text(data = ward_centroids, aes(x=x, y=y, label=ward), color = "white") +
  theme_map_sixtysix() +
  coord_map() +
  ggtitle("Philadelphia's Wards")

plot of chunk ward_map

The neighborhoods (or wards) that decide District 7

Could Maria Lose?

Maria Quiñones-Sánchez is facing a significant challenge for the second election in a row. The three term councilmember was challenged by Manny Morales four years ago, and eked out a 53.5% – 46.5% win, a margin of only 868 votes. She’s being challenged again this year by state Rep and ward leader Angel Cruz. What should we expect?

The 7th district is decidedly different from Jannie Blackwell’s 3rd or Kenyatta Johnson’s 2nd. Those heavily-segregated districts had clear racial coalitions that swung differently from year to year.

North Philly’s 7th is segregated, but more homogenously so. It’s Philadelphia’s most Hispanic district, with a predominantly Black section of Frankford in the Northeast and the beginnings of White Kensington gentrification expanding up in the South. In the manner of Philadelphia’s Hispanic neighborhoods, it has very low turnout.

But the vote this May won’t split along racial lines. Instead, what matters is the Wards.

Four years ago, three of the district’s ward leaders coordinated to support Quiñones-Sánchez’s challenger, with heavy involvement from Johnny Doc’s Local 98, including more than $100,000 in contributions and over 1,000 “voter assist requests” for help in the voting booth.

The leaders are coordinating a challenge again, with a candidate Cruz who has apparently less baggage than Morales.

(I can’t quite do justice to the story of this race. Check out the links above and Billy Penn’s summary. )

The challenge seems stronger this year, with eight of the twelve ward leaders in the district supporting Cruz in the party endorsement. But Quiñones-Sánchez held it off last year. What should we expect?

District 7’s voting blocks

The 7th Council district covers North Philly, with 6th and 9th Streets serving as the Western border.

View code
library(tidyverse)
library(rgdal)
library(rgeos)
library(sp)
library(ggmap)

sp_council <- readOGR("../../../data/gis/city_council/Council_Districts_2016.shp", verbose = FALSE)
sp_council <- spChFIDs(sp_council, as.character(sp_council$DISTRICT))

sp_divs <- readOGR("../../../data/gis/2016/2016_Ward_Divisions.shp", verbose = FALSE)
sp_divs <- spChFIDs(sp_divs, as.character(sp_divs$WARD_DIVSN))
sp_divs <- spTransform(sp_divs, CRS(proj4string(sp_council)))

load("../../../data/processed_data/df_major_2017_12_01.Rda")

ggcouncil <- fortify(sp_council) %>% mutate(council_district = id)
ggdivs <- fortify(sp_divs) %>% mutate(WARD_DIVSN = id)
View code
## Need to add District result election from 2015
raw_d2 <-  read.csv("../../../data/raw_election_data/2015_primary.csv") 
raw_d2 <- raw_d2 %>% 
  filter(OFFICE == "DISTRICT COUNCIL-7TH DISTRICT-DEM") %>%
  mutate(
    WARD = sprintf("%02d", asnum(WARD)),
    DIV = sprintf("%02d", asnum(DIVISION))
  )

load('../../../data/gis_crosswalks/div_crosswalk_2013_to_2016.Rda')
crosswalk_to_16 <- crosswalk_to_16 %>% group_by() %>%
  mutate(
    WARD = sprintf("%02s", as.character(WARD)),
    DIV = sprintf("%02s", as.character(DIV))
  )

d2 <- raw_d2 %>% 
  left_join(crosswalk_to_16) %>%
  group_by(WARD16, DIV16, OFFICE, CANDIDATE) %>%
  summarise(VOTES = sum(VOTES * weight_to_16)) %>%
  mutate(PARTY="DEMOCRATIC", year="2015", election="primary")
df_major <- bind_rows(df_major, d2)
View code
races <- tribble(
  ~year, ~OFFICE, ~office_name,
  "2015", "MAYOR", "Mayor",
  "2015", "DISTRICT COUNCIL-7TH DISTRICT-DEM", "Council 7th District",
  "2016", "PRESIDENT OF THE UNITED STATES", "President",
  "2017", "DISTRICT ATTORNEY", "District Attorney"
) %>% mutate(election_name = paste(year, office_name))

candidate_votes <- df_major %>% 
  filter(election == "primary" & PARTY == "DEMOCRATIC") %>%
  inner_join(races %>% select(year, OFFICE)) %>%
  mutate(WARD_DIVSN = paste0(WARD16, DIV16)) %>%
  group_by(WARD_DIVSN, OFFICE, year, election) %>%
  mutate(
    total_votes = sum(VOTES),
    pvote = VOTES / sum(VOTES)
  ) %>% 
  group_by()
  
turnout_df <- candidate_votes %>%
  filter(!grepl("COUNCIL", OFFICE)) %>% 
  group_by(WARD_DIVSN, OFFICE, year, election) %>%
  summarise(total_votes = sum(VOTES)) %>%
  left_join(
    sp_divs@data %>% select(WARD_DIVSN, AREA_SFT)
  )

turnout_df$AREA_SFT <- asnum(turnout_df$AREA_SFT)
View code
get_labpt_df <- function(sp){
  mat <- sapply(sp@polygons, slot, "labpt")
  df <- data.frame(x = mat[1,], y=mat[2,])
  return(
    cbind(sp@data, df)
  )
}

ggplot(ggcouncil, aes(x=long, y=lat)) +
  geom_polygon(
    aes(group=group),
    fill = strong_green, color = "white", size = 1
  ) +
  geom_text(
    data = get_labpt_df(sp_council),
    aes(x=x,y=y,label=DISTRICT)
  ) +
  theme_map_sixtysix() +
  coord_map() +
  ggtitle("Council Districts")

plot of chunk council_map

View code
DISTRICT <- "7"
sp_district <- sp_council[row.names(sp_council) == DISTRICT,]

bbox <- sp_district@bbox
## expand the bbox 20%for mapping
bbox <- rowMeans(bbox) + 1.2 * sweep(bbox, 1, rowMeans(bbox))

if(file.exists("map_cache.Rda")){
  load("map_cache.Rda")
} else {
    basemap <- get_map(bbox, maptype="toner-lite", filename="map_cache.png")
    save(basemap, file="map_cache.Rda")
}

district_map <- ggmap(
  basemap, 
  extent="normal", 
  base_layer=ggplot(ggcouncil, aes(x=long, y=lat, group=group)),
  maprange = FALSE
) 
## without basemap:
# district_map <- ggplot(ggcouncil, aes(x=long, y=lat, group=group))

district_map <- district_map +
  theme_map_sixtysix() +
  coord_map(xlim=bbox[1,], ylim=bbox[2,])


sp_divs$council_district <- over(
  gCentroid(sp_divs, byid = TRUE), 
  sp_council
)$DISTRICT

polygon_in_bbox <- function(p) {
  coords <- p@Polygons[[1]]@coords
  any(
    coords[,1] > bbox[1,1] &
      coords[,1] < bbox[1,2] &
      coords[,2] > bbox[2,1] &
      coords[,2] < bbox[2,2] 
  )
}

sp_divs$in_bbox <- sapply(
  sp_divs@polygons,
  polygon_in_bbox
)

ggdivs <- ggdivs %>% 
  left_join(
    sp_divs@data %>% select(WARD_DIVSN, in_bbox)
  )

district_map +
  geom_polygon(
    aes(alpha = (id == DISTRICT)),
    fill="black",
    color = "grey50",
    size=2
  ) +
  scale_alpha_manual(values = c(`TRUE` = 0.2, `FALSE` = 0), guide = FALSE) +
  ggtitle(sprintf("Council District %s", DISTRICT))

plot of chunk district_map The district has the lowest turnout in the city. Philadelphia’s Hispanic neighborhoods have very low turnout, and this district is the most Hispanic. Curiously, not only does the neighborhood have low turnout in Presidential elections, but it then has disproportionately lower turnout in municipal elections even given that: even residents who vote for President are less likely to vote in other years.

View code
# hist(turnout_df$total_votes / turnout_df$AREA_SFT, breaks = 1000)

turnout_df <- turnout_df %>%
  left_join(races)

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(turnout_df, by =c("id" = "WARD_DIVSN")),
    aes(fill = pmin(total_votes / AREA_SFT, 0.0005) * 5280^2)
  ) +
  scale_fill_viridis_c(
    "Votes per mile", 
    labels=scales::comma, 
    guide=guide_colorbar(label.theme=element_text(angle=90, size = 10), label.hjust=1)
  ) +
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  facet_wrap(~ election_name) +
  expand_limits(fill=0) +
  ggtitle(
    "Votes per mile in the Democratic Primary", 
    sprintf("Council District %s", DISTRICT)
  ) +
  theme(legend.position = "bottom", legend.direction = "horizontal")

plot of chunk turnout_map The district as a whole cast 25,000 votes in the 2016 Presidential primary, 12,000 last time Quninones-Sanchez ran, and only 6,000 in the the 2017 District Attorney race. They did not see the Krasner surge.

Demographically, the district is Philadelphia’s most heavily hispanic, though with Frankford in its Northeast being predominantly Black:

View code
bg_17_acs <- read.csv("../../../data/census/acs_2013_2017_phila_bg_race_income.csv")
bg_17_acs <- bg_17_acs %>% 
  mutate(Geo_FIPS = as.character(Geo_FIPS)) %>%
  select(Geo_FIPS, pop, pop_nh_white, pop_nh_black, pop_nh_asian, pop_hisp, pop_median_income_2017)

library(tigris)
options(tigris_use_cache = TRUE)
bg_shp <- block_groups(42, 101, year = 2015)
bg_shp <- spChFIDs(bg_shp, as.character(bg_shp$GEOID))
bg_shp <- spTransform(bg_shp, CRS(proj4string(sp_divs)))

bg_shp$in_bbox <- sapply(
  bg_shp@polygons,
  polygon_in_bbox
)

gg_bgs <- fortify(bg_shp)
gg_bgs <- gg_bgs %>%
  left_join(bg_shp@data[,c("GEOID", "ALAND", "in_bbox")], by = c("id" = "GEOID")) %>%
  left_join(bg_17_acs, by = c("id" = "Geo_FIPS"))

district_map +
  geom_polygon(
    data = gg_bgs %>%
      filter(in_bbox) %>%
      gather("key", "race_pop",pop_nh_white, pop_nh_black, pop_nh_asian, pop_hisp) %>%
      mutate(
        pct_pop = 100 * race_pop / pop,
        race = c(
          pop_hisp = "Hispanic", pop_nh_white="NH White", pop_nh_black="Black", pop_nh_asian="Asian"
        )[key]
      ),
    aes(fill = pct_pop)
  ) + 
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  facet_wrap(~race) +
  scale_fill_viridis_c("Percent of\n Population") +
  theme(legend.position = "right") +
  ggtitle(sprintf("Race and Ethnicity of District %s", DISTRICT))

plot of chunk census

The district is less obviously politically split than other parts of the city. When it does split, it often does so for Latino candidates. Below are the results from the last race for the 7th and three other recent, compelling Democratic Primary races: 2015 Mayor, 2016 President, and 2017 District Attorney. The maps below show the vote for the top two candidates in District 2 (except for City Council in 2015, where I use Helen Gym and Isaiah Thomas, who were 4th and 5th in the district, and 5th and 6th citywide.)

View code
candidate_votes <- candidate_votes %>%
  left_join(sp_divs@data %>% select(WARD_DIVSN, council_district))

## Choose the top two candidates in district 3
## Except for city council, where we choose Gym and Thomas
# candidate_votes %>%
#   group_by(OFFICE, year, CANDIDATE) %>%
#   summarise(
#     city_votes = sum(VOTES),
#     district_votes = sum(VOTES * (council_district == DISTRICT))
#   ) %>%
#   arrange(desc(city_votes)) %>%
#   filter(OFFICE == "MAYOR")

candidates_to_compare <- tribble(
  ~year, ~OFFICE, ~CANDIDATE, ~candidate_name, ~row,
  "2015", "DISTRICT COUNCIL-7TH DISTRICT-DEM", "MANNY MORALES", "Manny Morales", 1,
  "2015", "DISTRICT COUNCIL-7TH DISTRICT-DEM", "MARIA QUINONES SANCHEZ", "Maria Quiñones-Sánchez", 2,
  "2015", "MAYOR", "JIM KENNEY", "Jim Kenney",  2,
  "2015", "MAYOR", "NELSON DIAZ", "Nelson Diaz", 1,
  "2016", "PRESIDENT OF THE UNITED STATES", "BERNIE SANDERS", "Bernie Sanders", 2,
  "2016", "PRESIDENT OF THE UNITED STATES", "HILLARY CLINTON", "Hillary Clinton", 1,
  "2017", "DISTRICT ATTORNEY", "LAWRENCE S KRASNER", "Larry Krasner", 2,
  "2017", "DISTRICT ATTORNEY", "RICH NEGRIN","Rich Negrin", 1
)

candidate_votes <- candidate_votes %>%
  left_join(races) %>%
  left_join(candidates_to_compare)

vote_adjustment <- function(pct_vote, office){
  ifelse(office == "COUNCIL AT LARGE", pct_vote * 4, pct_vote)
}

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(
        candidate_votes %>% filter(!is.na(row))
      ),
    aes(fill = 100 * vote_adjustment(pvote, OFFICE))
  ) +
  scale_fill_viridis_c("Percent of Vote") +
  theme(
    legend.position =  "bottom",
    legend.direction = "horizontal",
    legend.justification = "center"
  ) +
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  geom_label(
    data=candidates_to_compare %>% left_join(races),
    aes(label = candidate_name),
    group=NA,
    hjust=0, vjust=1,
    x=bbox[1,1],
    y=bbox[2,2]
  ) +
  facet_grid(row ~ election_name) +
  theme(strip.text.y = element_blank()) +
  ggtitle(
    sprintf("Candidate performance in District %s", DISTRICT) 
    # "Percent of vote (times 4 for Council, times 1 for other offices)"
  )

plot of chunk proportion The district split for Mayor and for District Attorney along racial lines: the densest latino neighborhoods voted for Diaz and Negrin, while the rest of the district voted for the citywide winners Kenney and Krasner. However, the cleavage was geographically different for Quiñones-Sánchez v Morales, presumably because both candidates were latinx. That’s true this time around, too.

Just because there wasn’t a racial split doesn’t mean the district voted uniformly. There indeed were stark variations in how Quiñones-Sánchez did, but the reason isn’t obviously clear. She did well in the Northwest of the district, especially West of 6th St, and in the Northeast of the district except for South of the Boolevard and East of Oxford.

What gives? Those are all Ward boundaries.

View code
wards <- readOGR("../../../data/gis/2016","2016_Wards", verbose=FALSE) %>%
  spTransform(CRS(proj4string(sp_divs)))

ggwards <- fortify(wards)

wards_centers <- sapply(wards@polygons, slot, "labpt") %>% t
wards_centers <- as.data.frame(wards_centers)
names(wards_centers) <- c("x", "y")

wards@data <- cbind(wards@data, wards_centers)


district_map + 
  geom_polygon(data = ggwards, fill = NA, color = strong_red, size= 2) +
  geom_polygon(
    data = ggcouncil %>% filter(council_district == 7), 
    fill=NA, 
    color = "black", 
    size = 2
  ) +
  geom_text(
    data = wards@data,
    aes(x=x, y=y, label=WARD),
    group = NA,
    color = strong_red,
    fontface="bold"
  ) +
  ggtitle("The wards of District 7")

plot of chunk wards

Quiñones-Sánchez’s performance is a potent example of the power of Ward endorsements: she performed very differently in different wards, in some cases on literally the other side of the street. That thin sliver in the Northwest of the District where she did exceptionally well was the 43rd Ward. She received large percentages in the 23rd and 54th in the Northwest, too, but that region where she did poorly, South of the Boolevard and East of Oxford, exactly lines up with the boundaries to the 62nd.

So the question for the 2019 race becomes how the ward endorsements will shake out and how powerful each ward is, both in terms of ability to swing the vote and typicaly turnout. Below are measures of their strength. I’ve also pulled in the Ward leaders’ endorsements from the DCC vote.

View code
## Get vote-weighted populations
div_centroids <- gCentroid(sp_divs[sp_divs$council_district == DISTRICT,], byid=TRUE)
div_centroids$WARD_DIVSN <- attr(div_centroids@coords, "dimnames")[[1]]
div_centroids$bg_GEOID <- over(div_centroids, bg_shp)$GEOID
div_centroids@data <- left_join(div_centroids@data, bg_17_acs, by = c("bg_GEOID" = "Geo_FIPS"))

district_7_results <- df_major %>%
  filter(
    year == 2015 & grepl("7TH DISTRICT", OFFICE) & CANDIDATE != "Write In"
  ) %>%
  mutate(WARD_DIVSN = paste0(WARD16, DIV16)) %>%
  select(WARD_DIVSN, CANDIDATE, VOTES) %>%
  spread(CANDIDATE, VOTES) %>%
  mutate(
    total_votes = (`MARIA QUINONES SANCHEZ` + `MANNY MORALES`),
    p_quinones_sanchez = `MARIA QUINONES SANCHEZ` / total_votes
  )

div_centroids@data <- left_join(div_centroids@data, district_7_results)

ward_pops <- div_centroids@data %>%
  mutate(ward = substr(WARD_DIVSN, 1, 2)) %>%
  group_by(ward) %>%
  summarise(
    p_quinones_sanchez = 100 * weighted.mean(p_quinones_sanchez, w = total_votes),
    pct_nh_white = 100 * weighted.mean(pop_nh_white / pop, w = total_votes),
    pct_nh_black = 100 * weighted.mean(pop_nh_black / pop, w = total_votes),
    pct_nh_asian = 100 * weighted.mean(pop_nh_asian / pop, w = total_votes),
    pct_hisp = 100 * weighted.mean(pop_hisp / pop, w = total_votes),
    council_votes = sum(total_votes)
  )

ward_results <- df_major %>%
  mutate(WARD_DIVSN = paste0(WARD16, DIV16)) %>%
  inner_join(
    tribble(
      ~year, ~election, ~PARTY, ~OFFICE,
      "2016", "primary", "DEMOCRATIC", "PRESIDENT OF THE UNITED STATES",
      "2017", "primary", "DEMOCRATIC", "DISTRICT ATTORNEY"
    )
  ) %>%
  inner_join(div_centroids@data) %>%
  group_by(year, WARD16) %>%
  summarise(total_votes = sum(VOTES)) %>%
  group_by() %>%
  group_by(year) %>%
  mutate(pct_of_year = 100 * total_votes / sum(total_votes)) %>%
  gather("key", "value", total_votes, pct_of_year) %>%
  unite("key", key, year) %>%
  spread(key, value)
    
ward_pops <- ward_pops %>% 
  left_join(
    ward_results %>% rename(ward = WARD16)
  ) %>%
  group_by() %>%
  mutate(pct_of_year_2015 = 100 * council_votes / sum(council_votes))

ward_leaders <- tribble(
  ~ward, ~leader, ~endorsement,
  "07", "Angel Cruz", "Cruz",
  "18", "Theresa Alicea", "Cruz",
  "19", "Carlos Matos","Cruz",
  "23", "Timothy Savage","Quiñones-Sánchez",
  "25", "Thomas Johnson","Cruz",
  "31", "Margaret Rzepski","Cruz",
  "33", "Donna Aument","Cruz",
  "42", "Sharon Vaughn","Quiñones-Sánchez",
  "43", "Emilio Vazquez","Quiñones-Sánchez",
  "49", "Shirley Gregory","Cruz",
  "54", "Alan Butkovitz","Quiñones-Sánchez",
  "62", "Margaret Tartaglione","Cruz"
)

ward_pops <- ward_pops %>% left_join(ward_leaders)

knitr::kable(
  ward_pops %>%
    select(ward, council_votes, pct_of_year_2015, p_quinones_sanchez, leader, endorsement, pct_hisp, pct_nh_white, pct_nh_black, total_votes_2016, pct_of_year_2016, total_votes_2017, pct_of_year_2017) %>%
    arrange(desc(council_votes)),
    digits=0, 
    format.args=list(big.mark=','),
    col.names=c("Ward", "Votes for the 7th in 2015 Primary", "Pct of District", "% for Quiñones-Sánchez",  "Leader", "Endorsement", "% Hispanic", "% White", "% Black", "Votes in 2016 Primary", "Pct of District", "Votes in 2017 Primary", "Pct of District")
  )
Ward Votes for the 7th in 2015 Primary Pct of District % for Quiñones-Sánchez Leader Endorsement % Hispanic % White % Black Votes in 2016 Primary Pct of District Votes in 2017 Primary Pct of District
23 2,108 17 71 Timothy Savage Quiñones-Sánchez 28 19 46 4,021 16 1,284 22
62 1,774 14 33 Margaret Tartaglione Cruz 31 22 41 3,546 14 913 16
19 1,683 14 37 Carlos Matos Cruz 77 6 15 2,817 11 589 10
07 1,655 14 49 Angel Cruz Cruz 80 8 11 3,340 13 549 9
33 1,455 12 57 Donna Aument Cruz 63 10 18 3,337 13 566 10
42 1,177 10 63 Sharon Vaughn Quiñones-Sánchez 64 8 16 2,581 10 543 9
43 1,124 9 65 Emilio Vazquez Quiñones-Sánchez 66 3 29 2,259 9 445 8
18 611 5 51 Theresa Alicea Cruz 45 29 16 1,340 5 527 9
54 311 3 66 Alan Butkovitz Quiñones-Sánchez 17 18 49 906 4 196 3
25 133 1 59 Thomas Johnson Cruz 47 33 15 350 1 77 1
49 117 1 72 Shirley Gregory Cruz 28 0 72 192 1 44 1
31 98 1 63 Margaret Rzepski Cruz 23 57 8 265 1 129 2

The ward with the most votes for the 7th in 2015 also voted hard for Quiñones-Sánchez: the 23rd. Her 71% dominance of the 2,108 votes cast gave her an 886 vote edge, more than what she won the entire District by. That ward, which includes the district’s predominantly Black neighborhoods, turns out stronger than the rest. It represented 17% of the district’s votes in 2015, and then a whopping 22% of the votes in low-turnout 2017.

The next three most dominant wards were 62, 19, and 7 and those went for Morales with 63, 67, and 51% of the vote. (Notice that 62 is actually split by Council Districts; the number above is only for District 7). Those are the three wards led by State Senator Margaret Tartaglione, Carlos Matos, and Angel Cruz himself, the ward leaders that organized the challenge. Quiñones-Sánchez won all of the other Wards, but those three were enough to make the race close.

We can simplify the table above by combining all of the wards whose leaders supported Quiñones-Sánchez and those whose leaders supported Cruz.

View code
get_line <- function(x_total_votes, y_total_votes){
  ## solve p_x t_x+ p_y t_y > 50
  tot <- x_total_votes + y_total_votes
  tx <- x_total_votes / tot
  ty <- y_total_votes / tot

  slope <- -tx / ty
  intercept <- 50 / ty  # use 50 since proportions are x100
  c(intercept, slope)
}

endorsement_summary <- ward_pops %>%
  group_by(endorsement) %>%
  summarise(
    p_quinones_sanchez = weighted.mean(p_quinones_sanchez, w = council_votes),
    council_votes = sum(council_votes),
    total_votes_2016 = sum(total_votes_2016),
    total_votes_2017 = sum(total_votes_2017)
  )

candidate_results = with(
  endorsement_summary,
  tribble(
    ~candidate, 
    ~p_in_mqs_wards,
    ~p_in_challenger_wards, 
    ~total_votes_in_mqs_wards, 
    ~total_votes_in_challenger_wards,
    "Quiñones-Sánchez", 
    p_quinones_sanchez[endorsement == "Quiñones-Sánchez"],
    p_quinones_sanchez[endorsement == "Cruz"],
    council_votes[endorsement == "Quiñones-Sánchez"],
    council_votes[endorsement == "Cruz"],
    "Morales", 
    100 - p_quinones_sanchez[endorsement == "Quiñones-Sánchez"],
    100 - p_quinones_sanchez[endorsement == "Cruz"],
    council_votes[endorsement == "Quiñones-Sánchez"],
    council_votes[endorsement == "Cruz"]
  )
) %>%
  mutate(
    votes_in_mqs_wards = p_in_mqs_wards * total_votes_in_mqs_wards / 100,
    votes_in_challenger_wards = p_in_challenger_wards * total_votes_in_challenger_wards / 100
  )

knitr::kable(
  candidate_results %>% select(
    candidate,
    p_in_mqs_wards,
    votes_in_mqs_wards,
    p_in_challenger_wards,
    votes_in_challenger_wards
  ),
  digits=0, 
  format.args=list(big.mark=','),
  col.names=c("Candidate", "Percent in MQS-Endorsed Wards", "Votes in MQS-Endorsed Wards", "Percent in Cruz-Endorsed Wards",  "Votes in Cruz-Endorsed Wards")
)
Candidate Percent in MQS-Endorsed Wards Votes in MQS-Endorsed Wards Percent in Cruz-Endorsed Wards Votes in Cruz-Endorsed Wards
Quiñones-Sánchez 67 3,174 45 3,383
Morales 33 1,546 55 4,143

The four wards that backed Quiñones-Sánchez constituted only 39% of the votes in 2015, but she won them 2:1. The other eight wards combined for 61% of the vote, but Morales only won them 5:4. Quiñones-Sánchez’s wards represented more of the votes in 2017, boding well for this year, but it’s also likely that Cruz will do better in his wards than Morales did.

View code
line_2017 <- with(
  endorsement_summary,
  get_line(
    total_votes_2017[endorsement == "Quiñones-Sánchez"],
    total_votes_2017[endorsement == "Cruz"]
  )
)

line_2015 <- with(
  endorsement_summary,
  get_line(
    council_votes[endorsement == "Quiñones-Sánchez"],
    council_votes[endorsement == "Cruz"]
  )
)

library(ggrepel)
ggplot(
  candidate_results,
  aes(
    x=p_in_mqs_wards,
    y=p_in_challenger_wards
  )
) +
  geom_point() +
  geom_text_repel(aes(label=candidate)) +
  geom_abline(
    intercept = c(line_2015[1], line_2017[1]),
    slope = c(line_2015[2], line_2017[2]),
    linetype="dashed"
  ) +
  coord_fixed() +
  scale_x_continuous(
    "Percent in wards that endorsed Quiñones-Sánchez",
    breaks = seq(0,100,10)
  ) +
  scale_y_continuous(
    "Percent in wards that endorsed Cruz",
    breaks = seq(0, 100, 10)
  ) +
  annotate(
    geom="text",
    label=paste(c(2015, 2017), "turnout"),
    x=c(10, 8),
    y=c(
      line_2015[1] + 10 * line_2015[2],
      line_2017[1] + 8 * line_2017[2]
    ),
    hjust=0,
    vjust=-0.2,
    angle = atan(c(line_2015[2], line_2017[2])) / pi * 180,
    color="grey40"
  )+
  annotate(
    geom="text",
    x = 80,
    y=75,
    label="Candidate wins",
    fontface="bold",
    color = strong_green
  ) +
  geom_hline(yintercept = 50, color="grey50") +
  geom_vline(xintercept = 50, color="grey50")+
  expand_limits(x=100, y=30)+
  theme_sixtysix() +
  ggtitle(
    "The strength of District 7's wards in 2015",
    "Candidates to the top-right of dashed lines win."
  )

plot of chunk plot

Looking to May

So what are we to make of this race? Keep your eyes on the mobilization behind the Ward endorsements, especially the top turnout wards. Cruz will presumably do even better in his own ward than Morales did, so Quiñones-Sánchez’s success will likely hinge on continued high turnout in the 23rd and trying to consolidate the votes of the rest.

She managed to do that just well enough four years ago to eke out a 9 point victory. This year she faces a candidate without homophobic Facebook posts who also happens to be a Ward leader. If the results in every district were the same as in 2015 except Angel Cruz managed to win 77% of the vote in his own ward, the race would be an exact tie.

It’ll be close.

Predicting Elections Tutorial!

I’m teaching a workshop at Penn’s Master’s of Urban Spatial Analytics on April 22nd.

I’ve posted all of the materials on github, including RMarkdown walkthroughs. Want to learn how I predicted the 2018 election and only made one bad mistake? Check it out!

Note: As an early tester, by reading the posts you commit to sending me feedback. Preferably before April 22nd. thx.

Post 1: The Relational Database. How I’ve organized the election data.

Post 2: Geographies. How I crosswalked geographies across moving boundaries.

Post 3: Creating the rectangular data.frame. Final steps to get ready to model.

Post 4: Predicting the election. The good stuff! (You can skip the others. This is what I’ll be teaching.)

The Turnout Tracker is open sourced. And going to Chicago!

The Turnout Tracker: An Introduction

The turnout tracker is a citizen science tool to track election turnout in real time.

In May 2017, I noticed that Philadelphians had organically started sharing on social media where and when they voted, and what number voter they were at their precinct.

I thought “Wow, all that needs is a statistical model to know what turnout is across the city.” So I built it.

We’re going to Chicago

Chicagoans, I need your help! The municipal runoff elections are April 2nd. Let’s track turnout together.

Before Election Day

  • Tell your friends! Share this post!

On Election Day

Open Sourcing the Turnout Tracker

I’m also sharing the code behind the Turnout Tracker with the world. I’ve cleaned it up some, but better engineering and documentation is a work in progress.

Check out the repository at https://github.com/jtannen/turnout_tracker

Programmer/Data Scientist? The codebase may not be fully self-serve yet. So if you want to bring the Turnout Tracker to your city, get in touch. jonathan (dot) tannen (at) gmail (dot) com

_sunglassesemoji_

The neighborhoods that decide Council District 2

[Note 2019-03-09: this post has been heavily updated thanks to an insightful suggestion from @DanthePHLman]

Could Kenyatta Lose?

Kenyatta Johnson, the two term councilmember from Southwest and South Philly’s District 2, is being challenged by Lauren Vidas, the former assistant finance director under Mayor Nutter. Johnson dominated a challenge from developer Ori Feibush four years ago, but has since been mired in land deal scandals. In Wednesday’s post, I claimed District 3’s challenger faced a plausible but steep path. How about for District 2? What would it take for Vidas to win?

Johnson’s District 2 is quite different from West Philly’s District 3. The gentrification has covered less ground, and Graduate Hospital didn’t take to Bernie Sanders and Larry Krasner in the same way that University City did. On the other hand, Johnson’s recent scandals will likely kneecap his 2015 popularity, and Vidas occupies a quite different lane than developer Feibush.

What are the neighborhood cohorts that will decide District 2? If Johnson holds, what neighborhoods will he have done well in? If Vidas’s challenge is successful, which neighborhoods’ vote will she have monopolized?

District 2’s voting blocks

The voting blocks for District 2 are less distinct than for District 3: there’s the pro-Kenyatta base of Point Breeze, the challenger base of Grad Hospital and a nub of East Passyunk, and then there’s Southwest Philly, which is somewhere in between.

View code
library(tidyverse)
library(rgdal)
library(rgeos)
library(sp)
library(ggmap)

sp_council <- readOGR("../../../data/gis/city_council/Council_Districts_2016.shp", verbose = FALSE)
sp_council <- spChFIDs(sp_council, as.character(sp_council$DISTRICT))

sp_divs <- readOGR("../../../data/gis/2016/2016_Ward_Divisions.shp", verbose = FALSE)
sp_divs <- spChFIDs(sp_divs, as.character(sp_divs$WARD_DIVSN))
sp_divs <- spTransform(sp_divs, CRS(proj4string(sp_council)))

load("../../../data/processed_data/df_major_2017_12_01.Rda")

ggcouncil <- fortify(sp_council) %>% mutate(council_district = id)
ggdivs <- fortify(sp_divs) %>% mutate(WARD_DIVSN = id)
View code
## Need to add District 2 election from 2015
raw_d2 <-  read.csv("../../../data/raw_election_data/2015_primary.csv") 
raw_d2 <- raw_d2 %>% 
  filter(OFFICE == "DISTRICT COUNCIL-2ND DISTRICT-DEM") %>%
  mutate(
    WARD = sprintf("%02d", asnum(WARD)),
    DIV = sprintf("%02d", asnum(DIVISION))
  )

load('../../../data/gis_crosswalks/div_crosswalk_2013_to_2016.Rda')
crosswalk_to_16 <- crosswalk_to_16 %>% group_by() %>%
  mutate(
    WARD = sprintf("%02s", as.character(WARD)),
    DIV = sprintf("%02s", as.character(DIV))
  )

d2 <- raw_d2 %>% 
  left_join(crosswalk_to_16) %>%
  group_by(WARD16, DIV16, OFFICE, CANDIDATE) %>%
  summarise(VOTES = sum(VOTES * weight_to_16)) %>%
  mutate(PARTY="DEMOCRATIC", year="2015", election="primary")
df_major <- bind_rows(df_major, d2)
View code
races <- tribble(
  ~year, ~OFFICE, ~office_name,
  "2015", "MAYOR", "Mayor",
  "2015", "DISTRICT COUNCIL-2ND DISTRICT-DEM", "Council 2nd District",
  "2015", "COUNCIL AT LARGE", "City Council At Large",
  "2016", "PRESIDENT OF THE UNITED STATES", "President",
  "2017", "DISTRICT ATTORNEY", "District Attorney"
) %>% mutate(election_name = paste(year, office_name))

candidate_votes <- df_major %>% 
  filter(election == "primary" & PARTY == "DEMOCRATIC") %>%
  inner_join(races %>% select(year, OFFICE)) %>%
  mutate(WARD_DIVSN = paste0(WARD16, DIV16)) %>%
  group_by(WARD_DIVSN, OFFICE, year, election) %>%
  mutate(
    total_votes = sum(VOTES),
    pvote = VOTES / sum(VOTES)
  ) %>% 
  group_by()

turnout_df <- candidate_votes %>%
  filter(!grepl("COUNCIL", OFFICE)) %>% 
  group_by(WARD_DIVSN, OFFICE, year, election) %>%
  summarise(total_votes = sum(VOTES)) %>%
  left_join(
    sp_divs@data %>% select(WARD_DIVSN, AREA_SFT)
  )

turnout_df$AREA_SFT <- asnum(turnout_df$AREA_SFT)

The second council district covers Southwest Philly, and parts of South Philly including Point Breeze and Graduate Hospital.

View code
get_labpt_df <- function(sp){
  mat <- sapply(sp@polygons, slot, "labpt")
  df <- data.frame(x = mat[1,], y=mat[2,])
  return(
    cbind(sp@data, df)
  )
}

ggplot(ggcouncil, aes(x=long, y=lat)) +
  geom_polygon(
    aes(group=group),
    fill = strong_green, color = "white", size = 1
  ) +
  geom_text(
    data = get_labpt_df(sp_council),
    aes(x=x,y=y,label=DISTRICT)
  ) +
  theme_map_sixtysix() +
  coord_map() +
  ggtitle("Council Districts")

plot of chunk council_map

View code
DISTRICT <- "2"
sp_district <- sp_council[row.names(sp_council) == DISTRICT,]

bbox <- sp_district@bbox
## expand the bbox 20%for mapping
bbox <- rowMeans(bbox) + 1.2 * sweep(bbox, 1, rowMeans(bbox))

basemap <- get_map(bbox, maptype="toner-lite")

district_map <- ggmap(
  basemap, 
  extent="normal", 
  base_layer=ggplot(ggcouncil, aes(x=long, y=lat, group=group)),
  maprange = FALSE
) 
## without basemap:
# district_map <- ggplot(ggcouncil, aes(x=long, y=lat, group=group))

district_map <- district_map +
  theme_map_sixtysix() +
  coord_map(xlim=bbox[1,], ylim=bbox[2,])


sp_divs$council_district <- over(
  gCentroid(sp_divs, byid = TRUE), 
  sp_council
)$DISTRICT

sp_divs$in_bbox <- sapply(
  sp_divs@polygons,
  function(p) {
    coords <- p@Polygons[[1]]@coords
    any(
      coords[,1] > bbox[1,1] &
      coords[,1] < bbox[1,2] &
      coords[,2] > bbox[2,1] &
      coords[,2] < bbox[2,2] 
    )
  }
)

ggdivs <- ggdivs %>% 
  left_join(
    sp_divs@data %>% select(WARD_DIVSN, in_bbox)
  )

district_map +
  geom_polygon(
    aes(alpha = (id == DISTRICT)),
    fill="black",
    color = "grey50",
    size=2
  ) +
  scale_alpha_manual(values = c(`TRUE` = 0.2, `FALSE` = 0), guide = FALSE) +
  ggtitle(sprintf("Council District %s", DISTRICT))

plot of chunk district_map
Despite the large expanse of land, the vast majority of the district’s votes come from Center City and northern South Philly.

View code
# hist(turnout_df$total_votes / turnout_df$AREA_SFT, breaks = 1000)

turnout_df <- turnout_df %>%
  left_join(races)

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(turnout_df, by =c("id" = "WARD_DIVSN")),
    aes(fill = pmin(total_votes / AREA_SFT, 0.0005))
  ) +
  scale_fill_viridis_c(guide = FALSE) +
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  facet_wrap(~ election_name) +
  ggtitle(
    "Votes per mile in the Democratic Primary", 
    sprintf("Council District %s", DISTRICT)
  )

plot of chunk turnout_map
In fact, so few votes come from the industrial Southernmost tip of the city that let’s drop it from the maps. Sorry Navy Yard, but you’re ruining my scale.

View code
d2_subset <- sp_divs[sp_divs$council_district == DISTRICT,]
d2_subset <- d2_subset[
  d2_subset$WARD_DIVSN %in% 
    turnout_df$WARD_DIVSN[turnout_df$total_votes / turnout_df$AREA_SFT > 0.0001],
]

bbox <- gUnionCascaded(d2_subset)@bbox
## expand the bbox 20%for mapping
bbox <- rowMeans(bbox) + 1.2 * sweep(bbox, 1, rowMeans(bbox))

basemap <- get_map(bbox, maptype="toner-lite")

district_map <- ggmap(
  basemap, 
  extent="normal", 
  base_layer=ggplot(ggcouncil, aes(x=long, y=lat, group=group)),
  maprange = FALSE
) 
## without basemap:
# district_map <- ggplot(ggcouncil, aes(x=long, y=lat, group=group))

district_map <- district_map +
  theme_map_sixtysix() +
  coord_map(xlim=bbox[1,], ylim=bbox[2,])


sp_divs$council_district <- over(
  gCentroid(sp_divs, byid = TRUE), 
  sp_council
)$DISTRICT

First, let’s look at the results from five recent, compelling Democratic Primary races: 2015 City Council At Large, City Council District 2, and Mayor; 2016 President; and 2017 District Attorney. The maps below show the vote for the top two candidates in District 2 (except for City Council in 2015, where I use Helen Gym and Isaiah Thomas, who were 4th and 5th in the district, and 5th and 6th citywide.)

View code
candidate_votes <- candidate_votes %>%
  left_join(sp_divs@data %>% select(WARD_DIVSN, council_district))

## Choose the top two candidates in district 3
## Except for city council, where we choose Gym and Thomas
# candidate_votes %>%
#   group_by(OFFICE, year, CANDIDATE) %>%
#   summarise(
#     city_votes = sum(VOTES),
#     district_votes = sum(VOTES * (council_district == DISTRICT))
#   ) %>%
#   arrange(desc(city_votes)) %>%
#   filter(OFFICE == "DISTRICT ATTORNEY")

candidates_to_compare <- tribble(
  ~year, ~OFFICE, ~CANDIDATE, ~candidate_name, ~row,
  "2015", "COUNCIL AT LARGE", "HELEN GYM", "Helen Gym", 2,
  "2015", "COUNCIL AT LARGE", "ISAIAH THOMAS", "Isaiah Thomas", 1,
  "2015", "DISTRICT COUNCIL-2ND DISTRICT-DEM", "KENYATTA JOHNSON", "Kenyatta Johnson", 1,
  "2015", "DISTRICT COUNCIL-2ND DISTRICT-DEM", "ORI C FEIBUSH", "Ori Feibush", 2,
  "2015", "MAYOR", "JIM KENNEY", "Jim Kenney",  2,
  "2015", "MAYOR", "ANTHONY HARDY WILLIAMS", "Anthony Hardy Williams", 1,
  "2016", "PRESIDENT OF THE UNITED STATES", "BERNIE SANDERS", "Bernie Sanders", 2,
  "2016", "PRESIDENT OF THE UNITED STATES", "HILLARY CLINTON", "Hillary Clinton", 1,
  "2017", "DISTRICT ATTORNEY", "LAWRENCE S KRASNER", "Larry Krasner", 2,
  "2017", "DISTRICT ATTORNEY", "JOE KHAN","Joe Khan", 1
)

candidate_votes <- candidate_votes %>%
  left_join(races) %>%
  left_join(candidates_to_compare)

vote_adjustment <- function(pct_vote, office){
  ifelse(office == "COUNCIL AT LARGE", pct_vote * 4, pct_vote)
}

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(
        candidate_votes %>% filter(!is.na(row))
      ),
    aes(fill = 100 * vote_adjustment(pvote, OFFICE))
  ) +
  scale_fill_viridis_c("Percent of Vote") +
  theme(
    legend.position =  "bottom",
    legend.direction = "horizontal",
    legend.justification = "center"
  ) +
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  geom_label(
    data=candidates_to_compare %>% left_join(races),
    aes(label = candidate_name),
    group=NA,
    hjust=0, vjust=1,
    x=bbox[1,1],
    y=bbox[2,2]
  ) +
  facet_grid(row ~ election_name) +
  theme(strip.text.y = element_blank()) +
  ggtitle(
    sprintf("Candidate performance in District %s", DISTRICT), 
    "Percent of vote (times 4 for Council, times 1 for other offices)"
  )

plot of chunk proportion
Notice two things. First, the section of Point Breeze that dominated for Kenyatta Johnson in 2015, but also voted disproportionately for Isaiah Thomas, Anthony Hardy Williams, and Hillary Clinton. These are predominantly Black neighborhoods that didn’t bite on Helen Gym, Jim Kenney, or Bernie Sanders. Unlike in West Philly, Krasner did even better in Black Point Breeze than he did in the White, gentrified Graduate Hospital, where Joe Khan did unusually well. East Passyunk exhibited similar Krasner excitement to University City.

Second, note that Washington Avenue provides the stark boundary between pro-Kenyatta Point Breeze and pro-Gym, Feibush, and Kenney Graduate Hospital (Interested in this emergent boundary? Boy, have I got a dissertation for you!) Above Washington (along with the nub of East Passyunk that extends into the East of the district) both support the farther left challengers and turn out in force, although they didn’t support Sanders and Krasner as sharply as other gentrified parts of the city.

The district had one more coalition, hidden by these maps: Trump supporters.

View code
usp_2016 <- df_major %>%
  filter(
    election=="general"&
      year == 2016 &
      OFFICE == "PRESIDENT OF THE UNITED STATES" &
      CANDIDATE %in% c("DONALD J TRUMP", "HILLARY CLINTON")
    ) %>%
  mutate(WARD_DIVSN = paste0(WARD16, DIV16)) %>%
  group_by(WARD_DIVSN, CANDIDATE) %>%
  summarise(VOTES = sum(VOTES)) %>%
  group_by(WARD_DIVSN) %>%
  summarise(
    turnout = sum(VOTES),
    pdem = sum(VOTES * (CANDIDATE == "HILLARY CLINTON")) / sum(VOTES)
  )

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(
        usp_2016
      ),
    aes(fill = 100 * (1-pdem))
  ) +
  scale_fill_gradient2(
    "Percent for Donald Trump",
    low = strong_blue, mid = "white", high = strong_red, midpoint = 50
  )+
  theme(
    legend.position =  "bottom",
    legend.direction = "horizontal",
    legend.justification = "center"
  ) +
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  expand_limits(fill = 80) +
  ggtitle("South of Passyunk went for Trump", "Percent of two-party vote in the 2016 Presidential election")

plot of trump support

South of Passyunk voted Trump, with up to 60% of the vote! Coupled with parts of the Northeast, this represents Philadelphia’s Trump Democrats. We’ll treat them separately.

To simplify the analysis, let’s divide the District into coalitions. We’ll use four: “Johnson’s Base” of Point Breeze, “Gentrified Challengers” of Graduate Hospital and East Passyunk, “Southwest Philly”, which supported Johnson but not homogenously, and “Trumpist South Philly”, below Passyunk.

View code

xcand <- "Kenyatta Johnson"
ycand <- "Larry Krasner"

## Everything west of the Schuylkill call Southwest.
div_centroids <- gCentroid(sp_divs[sp_divs$council_district == DISTRICT,], byid=TRUE)
sw_divs <- attr(div_centroids@coords, "dimnames")[[1]][div_centroids@coords[,1] < -75.20486]

## Pull out the places trump won
trump_winners <- usp_2016 %>%
  inner_join(sp_divs@data %>% filter(council_district == DISTRICT)) %>%
  filter(pdem < 0.5)

district_categories <- candidate_votes %>%
  filter(!is.na(candidate_name)) %>%
  group_by(WARD_DIVSN) %>%
  mutate(votes_2016 = total_votes[candidate_name == 'Bernie Sanders']) %>%
  group_by() %>%
    filter(
      council_district == DISTRICT & 
        candidate_name %in% c(xcand, ycand)
    ) %>%
    group_by(WARD_DIVSN, votes_2016) %>%
    summarise(
      x_pvote = pvote[candidate_name == xcand],
      y_pvote = pvote[candidate_name == ycand]
    ) %>%
  mutate(
    is_sw = WARD_DIVSN %in% sw_divs,
    trump_winner = WARD_DIVSN %in% trump_winners$WARD_DIVSN,
    cat = ifelse(is_sw, "Southwest", ifelse(trump_winner, "Trumpists", "East"))
  ) 

# district_categories <- district_categories %>% left_join(turnout_wide, by = "WARD_DIVSN")

ggplot(
  district_categories,
  aes(x = 100 * x_pvote, y = 100 * y_pvote)
  # aes(x = 100 * x_pvote, y = (votes_2017 - votes_2015) * 5280^2)
) +
  geom_point(aes(size = votes_2016), alpha = 0.3) +
  scale_size_area("Total Votes in 2016")+
  theme_sixtysix() +
  xlab(sprintf("Percent of Vote for %s", xcand)) +
  ylab("Change in Votes/Mile, 2015 - 2017") + 
  ylab(sprintf("Percent of Vote for %s", ycand)) +
  coord_fixed() +
  geom_abline(slope = 3, intercept =  -130) +
  # geom_hline(yintercept=0) +
  # geom_vline(xintercept=60, linetype="dashed") +
  # geom_abline(slope = 100, intercept =  -7000) +
  geom_text(
    data = data.frame(cat = rep("East", 2)),
    x = c(28, 80),
    y = c(70, 10),
    hjust = 0.5,
    label = c("Challenger\nBase", "Johnson\nBase"),
    color = c(strong_green, strong_purple),
    fontface="bold"
  ) +
  facet_wrap(~cat)+
  ggtitle("Divisions' vote", sprintf("District %s Democratic Primary", DISTRICT))

plot of chunk scatter_bernie_gym
The Vote for Krasner is only weakly negatively correlated with the vote for Johnson, surprisingly. I’ve drawn an arbitrary line that appears to divide the clusters. We’ll call the divisions in the East above the line the Gentrified Challengers, and those below the line Johnson’s Base.

Here’s the map of the cohorts that this categorization gives us.

View code
district_categories$category <- with(
  district_categories,
  ifelse(
    cat != "East", cat,
    ifelse(y_pvote > 3 * x_pvote - 1.30, "Gentrified Challengers", "Johnson Base")
  )
)

cohort_colors <- c(
      "Johnson Base" = strong_purple,
      "Gentrified Challengers" = strong_green,
      "Southwest" = strong_orange,
      "Trumpists" = strong_red
    )

district_map + 
  geom_polygon(
    data = ggdivs %>% 
      left_join(district_categories) %>% 
      filter(!is.na(category)),
    aes(fill = category)
  ) +
  scale_fill_manual(
    "Cohort",
    values=cohort_colors
  ) +
  ggtitle(sprintf("District %s neighborhood divisions", DISTRICT))+
  theme(legend.position = c(0,1), legend.justification = c(0,1))

plot of chunk category_map
Looks reasonable.

How did the candidates do in each of the sections? The boundaries separate drastic performance splits.

View code
neighborhood_summary <- candidate_votes %>% 
  inner_join(candidates_to_compare) %>%
  group_by(candidate_name, election_name) %>%
  mutate(
    citywide_votes = sum(VOTES),
    citywide_pvote = 100 * sum(VOTES) / sum(total_votes)
  ) %>%
  filter(council_district == DISTRICT) %>%
  left_join(district_categories) %>%
  group_by(candidate_name, citywide_votes, citywide_pvote, election_name, category) %>%
  summarise(
    votes = sum(VOTES),
    pvote = 100 * sum(VOTES) / sum(total_votes),
    total_votes = sum(total_votes)
  ) %>%
  group_by(candidate_name, election_name) %>%
  mutate(
    district_votes = sum(votes),
    district_pvote = 100 * sum(votes) / sum(total_votes)
  ) %>% select(
    election_name, candidate_name, citywide_pvote, district_pvote, category, pvote, total_votes
  ) %>%
  gather(key="key", value="value", pvote, total_votes) %>%
  unite("key", category, key) %>%
  spread(key, value)
  

neighborhood_summary %>%
  knitr::kable(
    digits=0, 
    format.args=list(big.mark=','),
    col.names=c("Election", "Candidate", "Citywide %", sprintf("District %s %%", DISTRICT), "Gentrified Challengers %", "Gentrified Challengers Turnout", "Johnson Base %", "Johnson Base Turnout", "Southwest %", "Southwest Turnout", "Trumpist %", "Trumpist Turnout")
  )
Election Candidate Citywide % District 2 % Gentrified Challengers % Gentrified Challengers Turnout Johnson Base % Johnson Base Turnout Southwest % Southwest Turnout Trumpist % Trumpist Turnout
2015 City Council At Large Helen Gym 8 9 14 24,471 7 25,139 4 15,569 5 5,338
2015 City Council At Large Isaiah Thomas 7 7 5 24,471 10 25,139 8 15,569 2 5,338
2015 Council 2nd District Kenyatta Johnson 62 62 44 6,669 79 9,508 64 5,894 36 2,057
2015 Council 2nd District Ori Feibush 38 38 55 6,669 21 9,508 36 5,894 64 2,057
2015 Mayor Anthony Hardy Williams 26 30 11 7,606 41 10,127 45 6,740 2 2,580
2015 Mayor Jim Kenney 56 56 70 7,606 46 10,127 44 6,740 89 2,580
2016 President Bernie Sanders 37 37 39 11,702 38 14,293 29 9,224 47 2,051
2016 President Hillary Clinton 63 63 60 11,702 62 14,293 71 9,224 52 2,051
2017 District Attorney Joe Khan 20 23 33 6,985 18 6,569 13 3,354 14 982
2017 District Attorney Larry Krasner 38 39 43 6,985 41 6,569 32 3,354 18 982

The turnout splits are fascinating. The Johnson Base represented a consistent 37ish percent of the votes, dominating the election in 2015 and 2016, but surpassed by the Gentrified Challengers’ 39% in 2017. Still, the Southwest typically represents 25% of the votes (this fell to 19% in 2017), so Johnson’s Base combined with the Southwest made up a strong 63% of the 2016 vote, and 58% of the 2017 vote.

View code
cohort_turnout <- neighborhood_summary %>%
  group_by() %>%
  filter(election_name %in% c("2015 Mayor", "2016 President", "2017 District Attorney")) %>%
  select(election_name, ends_with("_total_votes")) %>%
  gather("cohort", "turnout", -election_name) %>%
  unique() %>%
  mutate(
    year = substr(election_name, 1, 4),
    cohort = gsub("^(.*)_total_votes", "\\1", cohort)
  ) %>%
  group_by(year) %>%
  mutate(pct_turnout = turnout / sum(turnout))
  
ggplot(cohort_turnout, aes(x=year, y=100*pct_turnout)) +
  geom_line(aes(group=cohort, color=cohort), size=2) +
  geom_point(aes(color=cohort), size=4) +
  scale_color_manual(values=cohort_colors, guide=FALSE) +
  theme_sixtysix() +
  expand_limits(y=0) +
  expand_limits(x=4)+
  geom_text(
    data = cohort_turnout %>% filter(year == 2017),
    aes(label = cohort, color = cohort),
    x = 3.05,
    fontface="bold",
    hjust = 0
  ) +
  ylab("Percent of District 2's votes") +
  xlab("") +
  ggtitle(
    "Cohorts' electoral strength", "Percent of District 2's votes in the Democratic Primary"
  )

plot of chunk turnout by cohort

How does the combination of (a) Johnson’s Base sheer size but (b) the Gentrifiers’ surge in voting impact the election? It comes down to percent of the vote in each region. In 2015, Kenyatta won 44% in the Challenger Base even as he dominated his own Base and the Southwest, 79 and 64%. Feibush won 64% from the South Philly Trumpists. Vidas, who is a very different candidate from Feibush (to put it mildly), would have to do much, much better in the Gentrified regions, and hope Johnson’s dominance of Point Breeze has fallen.

The relative power of West and Southwest and University City

How much does the power shift between the two cohorts? Let’s do some math.

How much does a candidate need from each of the sections to win? Let t_i be the relative turnout in section i, defined as the proportion of total votes. So in the 2017 District Attorney Race, t_i was 0.39 for the Gentrified Challengers, and 0.37 for the Johnson Base. Let p_ic be the proportion of the vote received by candidate c in section i, so in 2017, p is 0.41 for Krasner in the Johnson Base.

Then a candidate wins a two-way race whenever the turnout-weighted proportion of their vote is greater than 0.5: sum_over_i(t_i p_ic) > 0.5.

Since we’ve divided District 2 into four sections, it’s hard to plot on a two-way axis. For simplicity, I’ll combine the Johnson Base with Southwest Philly, and the Gentrified Challengers with the Trumpists (these are in my opinion the likely race-correlated dynamics that will play out). On the x-axis, let’s map a candidate’s percent of the vote in the Gentrifiers + Trumpists, and on the y, a candidate’s percent of the vote in Southwest + the Johnson Base (assuming a two-person race). The candidate wins whenever the average of their proportions, weighted by t, is greater than 50%. The dashed lines show the win boundaries; candidates to the top-right of the lines win. Turnout matters less than in District 2 than in District 3 because it swings less; they didn’t experience the Krasner bump in 2017.

I’ll plot only the two-candidate vote for the top two candidates in the district for each race, to emulate a two-person race. (For City Council in 2015, I use Helen Gym and Isaiah Thomas, who were 4th and 5th in the district, and 5th and 6th citywide.)

View code
get_line <- function(x_total_votes, y_total_votes){
  ## solve p_x t_x+ p_y t_y > 50
  tot <- x_total_votes + y_total_votes
  tx <- x_total_votes / tot
  ty <- y_total_votes / tot

  slope <- -tx / ty
  intercept <- 50 / ty  # use 50 since proportions are x100
  c(intercept, slope)
}

line_2017 <- with(
  neighborhood_summary,
  get_line(
    (`Gentrified Challengers_total_votes` + Trumpists_total_votes)[candidate_name == "Larry Krasner"],
    (`Johnson Base_total_votes` + `Southwest_total_votes`)[candidate_name == "Larry Krasner"]
  )
)

line_2015 <- with(
  neighborhood_summary,
  get_line(
    (`Gentrified Challengers_total_votes` + Trumpists_total_votes)[candidate_name == "Jim Kenney"],
    (`Johnson Base_total_votes` + `Southwest_total_votes`)[candidate_name == "Jim Kenney"]
  )
)

## get the two-candidate vote
neighborhood_summary <- neighborhood_summary %>%
  group_by(election_name)  %>% 
  mutate(
    challenger_pvote_2cand = (
      `Gentrified Challengers_pvote` + Trumpists_pvote
      ) / sum(`Gentrified Challengers_pvote` + Trumpists_pvote),
    kenyatta_pvote_2cand = (`Southwest_pvote` + `Johnson Base_pvote`)/sum(`Southwest_pvote` + `Johnson Base_pvote`)
  )


library(ggrepel)

ggplot(
  neighborhood_summary,
  aes(
    x=100*challenger_pvote_2cand,
    y=100*kenyatta_pvote_2cand
  )
) +
  geom_point() +
  geom_text_repel(aes(label=candidate_name)) +
  geom_abline(
    intercept = c(line_2015[1], line_2017[1]),
    slope = c(line_2015[2], line_2017[2]),
    linetype="dashed"
  ) +
  coord_fixed() + 
  scale_x_continuous(
    "Gentrified Challenger + Trumpist percent of vote",
    breaks = seq(0,100,10)
  ) +
  scale_y_continuous(
    "Johnson Base + Southwest percent of vote",
    breaks = seq(0, 100, 10)
  ) +
  annotate(
    geom="text",
    label=paste(c(2015, 2017), "turnout"),
    x=c(10, 8),
    y=c(
      line_2015[1] + 10 * line_2015[2],
      line_2017[1] + 8 * line_2017[2]
    ),
    hjust=0,
    vjust=-0.2,
    angle = atan(c(line_2015[2], line_2017[2])) / pi * 180,
    color="grey40"
  )+
  annotate(
    geom="text",
    x = 80,
    y=75,
    label="Candidate wins",
    fontface="bold",
    color = strong_green
  ) +
  geom_hline(yintercept = 50, color="grey50") +
  geom_vline(xintercept = 50, color="grey50")+
  expand_limits(x=100, y=80)+
  theme_sixtysix() +
  ggtitle(
    "The relative strengths of District 2 neighborhoods",
    "Candidates to the top-right of the lines win. Points are two-candidate vote."
  )

plot of chunk win_scatter

Hillary Clinton, Larry Krasner, and Jim Kenney won the two-way votes in all sections. Kenyatta lost the Gentrified + Trumpist vote 59-41, but dominated Point Breeze and Southwest Philly. (Notice the points don’t match the table above because these are two-candidate votes.)

What would be Vidas’s path to victory? Helen Gym looks like a prototype (remember that there were actually 16 candidates for five spots, so this head-to-head analysis is hypothetical). Developer Ori Feibush didn’t do nearly well enough in Grad Hospital and East Passyunk to win. If Vidas burnishes more progressive credentials, and pushes that percentage up to 80%, then she could win even if Johnson doesn’t lose any support in his base.

Looking to May

We’re left in a grey area. There are reasons to believe that the recent scandals could drastically change Johnson’s support from 2015, but without polling, we have no way to tell exactly how much. It would take a huge change from 2015 for him to lose, but the combination of scandal and not running against Feibush could be that change.

Up next, I’ll stick with scandal-plagued incumbents and look at Henon’s District 6. Stay tuned!

The neighborhoods that decide Council District 3

Could Jannie Lose?

Jannie Blackwell, the six term councilmember from West Philly’s District 3, is being challenged by Jamie Gauthier. The race appears to be shaping up as a reform-minded challenger against a powerful longtime incumbent, and it’s generated some serious buzz due to recent protests and homophobic remarks. Could it really be close?

More generally, I’m curious about the way Philadelphia’s gentrification and the 2016 election have changed electoral power structures. Even in 2015, Helen Gym won largely on the votes of Center City and the ring around it. But then 2016 happened, and turnout in those neighborhoods reached unprecedented heights. Exactly how powerful is that cohort? And while they’re strong citywide, have they taken over specific districts, to be able to dictate outcomes there?

Blackwell hasn’t faced a primary challenger since 1999, so we don’t have any evidence on her individual strength. Let’s instead look at recent competitive elections that could illustrate the neighborhood’s relative views.

What are the neighborhood cohorts that will decide District 3? Is the Krasner/Gym base strong enough on its own to dictate the election, or is the traditionally decisive West and Southwest Philly base still decisive?

District 3’s voting blocks

In the last three Democratic primaries, District 3 has displayed two clear voting blocks: University City and farther West/Southwest Philly.

View code
library(tidyverse)
library(rgdal)
library(rgeos)
library(sp)
library(ggmap)

sp_council <- readOGR("../../../data/gis/city_council/Council_Districts_2016.shp", verbose = FALSE)
sp_council <- spChFIDs(sp_council, as.character(sp_council$DISTRICT))

sp_divs <- readOGR("../../../data/gis/2016/2016_Ward_Divisions.shp", verbose = FALSE)
sp_divs <- spChFIDs(sp_divs, as.character(sp_divs$WARD_DIVSN))
sp_divs <- spTransform(sp_divs, CRS(proj4string(sp_council)))

load("../../../data/processed_data/df_major_2017_12_01.Rda")

ggcouncil <- fortify(sp_council) %>% mutate(council_district = id)
ggdivs <- fortify(sp_divs) %>% mutate(WARD_DIVSN = id)
View code
races <- tribble(
  ~year, ~OFFICE, ~office_name,
  "2015", "MAYOR", "Mayor",
  "2015", "COUNCIL AT LARGE", "City Council",
  "2016", "PRESIDENT OF THE UNITED STATES", "President",
  "2017", "DISTRICT ATTORNEY", "District Attorney"
) %>% mutate(election_name = paste(year, office_name))

candidate_votes <- df_major %>% 
  filter(election == "primary" & PARTY == "DEMOCRATIC") %>%
  inner_join(races %>% select(year, OFFICE)) %>%
  mutate(WARD_DIVSN = paste0(WARD16, DIV16)) %>%
  group_by(WARD_DIVSN, OFFICE, year, election) %>%
  mutate(
    total_votes = sum(VOTES),
    pvote = VOTES / sum(VOTES)
  ) %>% 
  group_by()

turnout_df <- candidate_votes %>%
  filter(OFFICE != "COUNCIL AT LARGE") %>% 
  group_by(WARD_DIVSN, OFFICE, year, election) %>%
  summarise(total_votes = sum(VOTES)) %>%
  left_join(
    sp_divs@data %>% select(WARD_DIVSN, AREA_SFT)
  )

turnout_df$AREA_SFT <- asnum(turnout_df$AREA_SFT)

The third council district covers West Philly, from the Schuylkill River to the city line.

View code
get_labpt_df <- function(sp){
  mat <- sapply(sp@polygons, slot, "labpt")
  df <- data.frame(x = mat[1,], y=mat[2,])
  return(
    cbind(sp@data, df)
  )
}

ggplot(ggcouncil, aes(x=long, y=lat)) +
  geom_polygon(
    aes(group=group),
    fill = strong_green, color = "white", size = 1
  ) +
  geom_text(
    data = get_labpt_df(sp_council),
    aes(x=x,y=y,label=DISTRICT)
  ) +
  theme_map_sixtysix() +
  coord_map() +
  ggtitle("Council Districts")

plot of chunk council_map

View code
DISTRICT <- "3"
sp_district <- sp_council[row.names(sp_council) == DISTRICT,]

bbox <- sp_district@bbox
## expand the bbox 20%for mapping
bbox <- rowMeans(bbox) + 1.2 * sweep(bbox, 1, rowMeans(bbox))

basemap <- get_map(bbox, maptype="toner-lite")

district_map <- ggmap(
  basemap, 
  extent="normal", 
  base_layer=ggplot(ggcouncil, aes(x=long, y=lat, group=group)),
  maprange = FALSE
) +
  theme_map_sixtysix() +
  coord_map(xlim=bbox[1,], ylim=bbox[2,])

sp_divs$council_district <- over(
  gCentroid(sp_divs, byid = TRUE), 
  sp_council
)$DISTRICT

sp_divs$in_bbox <- sapply(
  sp_divs@polygons,
  function(p) {
    coords <- p@Polygons[[1]]@coords
    any(
      coords[,1] > bbox[1,1] &
      coords[,1] < bbox[1,2] &
      coords[,2] > bbox[2,1] &
      coords[,2] < bbox[2,2] 
    )
  }
)

ggdivs <- ggdivs %>% 
  left_join(
    sp_divs@data %>% select(WARD_DIVSN, in_bbox)
  )

district_map +
  geom_polygon(
    aes(alpha = (id == DISTRICT)),
    fill="black",
    color = "grey50",
    size=2
  ) +
  scale_alpha_manual(values = c(`TRUE` = 0.2, `FALSE` = 0), guide = FALSE) +
  ggtitle(sprintf("Council District %s", DISTRICT))

plot of chunk district_map

First, let’s look at the results from four recent, compelling Democratic Primary races: 2015 City Council At Large and Mayor, 2016 President, and 2017 District Attorney. The maps below show the vote for the top two candidates in District 3 (except for City Council in 2015, where I use Helen Gym and Isaiah Thomas, who were 4th and 5th in the district, and 5th and 6th citywide.)

View code
candidate_votes <- candidate_votes %>%
  left_join(sp_divs@data %>% select(WARD_DIVSN, council_district))

## Choose the top two candidates in district 3
# Except for city council, where we choose Gym and Thomas
# candidate_votes %>% 
#   group_by(OFFICE, year, CANDIDATE) %>% 
#   summarise(
#     city_votes = sum(VOTES), 
#     district_votes = sum(VOTES * (council_district == DISTRICT))
#   ) %>% 
#   arrange(desc(district_votes)) %>%
#   filter(OFFICE == "COUNCIL AT LARGE")

candidates_to_compare <- tribble(
  ~year, ~OFFICE, ~CANDIDATE, ~candidate_name, ~row,
  "2015", "COUNCIL AT LARGE", "HELEN GYM", "Helen Gym", 1,
  "2015", "COUNCIL AT LARGE", "ISAIAH THOMAS", "Isaiah Thomas", 2,
  "2015", "MAYOR", "JIM KENNEY", "Jim Kenney",  1,
  "2015", "MAYOR", "ANTHONY HARDY WILLIAMS", "Anthony Hardy Williams", 2,
  "2016", "PRESIDENT OF THE UNITED STATES", "BERNIE SANDERS", "Bernie Sanders", 1,
  "2016", "PRESIDENT OF THE UNITED STATES", "HILLARY CLINTON", "Hillary Clinton", 2,
  "2017", "DISTRICT ATTORNEY", "LAWRENCE S KRASNER", "Larry Krasner", 1,
  "2017", "DISTRICT ATTORNEY", "TARIQ KARIM EL SHABAZZ","Tariq Karim El Shabazz", 2
)

candidate_votes <- candidate_votes %>%
  left_join(races) %>%
  left_join(candidates_to_compare)

vote_adjustment <- function(pct_vote, office){
  ifelse(office == "COUNCIL AT LARGE", pct_vote * 4, pct_vote)
}

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(
        candidate_votes %>% filter(!is.na(row))
      ),
    aes(fill = 100 * vote_adjustment(pvote, OFFICE))
  ) +
  scale_fill_viridis_c("Percent of Vote") +
  theme(
    legend.position =  "bottom",
    legend.direction = "horizontal",
    legend.justification = "center"
  ) +
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  geom_label(
    data=candidates_to_compare %>% left_join(races),
    aes(label = candidate_name),
    group=NA,
    hjust=0, vjust=1,
    x=-75.258,
    y=39.985
  ) +
  facet_grid(row ~ election_name) +
  theme(strip.text.y = element_blank()) +
  ggtitle(
    sprintf("Candidate performance in District %s", DISTRICT), 
    "Percent of vote (times 4 for Council, times 1 for other offices)"
  )

plot of chunk proportion
Notice two things. First, these competitive elections all split along the same boundaries: University City versus farther West and Southwest Philly. The candidates’ overall results were different (Sanders lost the district, Krasner won), but their relative strengths were exactly the same place. Demographically, the split is obvious: University City is predominantly White and wealthier, farther West is predominantly Black and has lower incomes. Even though Krasner did well across the city, and Shabazz poorly, Krasner did disproportionately well in University City, and Shabazz dispropotionately well farther West and Southwest.

Turnout is a more complicated story.

View code
# hist(turnout_df$total_votes / turnout_df$AREA_SFT)

turnout_df <- turnout_df %>%
  left_join(races)

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(turnout_df, by =c("id" = "WARD_DIVSN")),
    aes(fill = pmin(total_votes / AREA_SFT, 0.0005))
  ) +
  scale_fill_viridis_c(guide = FALSE) +
  geom_polygon(
    fill=NA,
    color = "white",
    size=1
  ) +
  facet_wrap(~ election_name) +
  ggtitle(
    "Votes per mile in the Democratic Primary", 
    sprintf("Council District %s", DISTRICT)
  )

plot of chunk turnout_map
The 2017 election was completely different from 2015. In 2015, we saw the West and Southwest Philly neighborhoods dominate the vote, and decide the election. In 2017, University City (really, Cedar Park and Spruce Hill) boomed for Krasner. While Gym, Kenney, and Sanders all monopolized the University City percent of the vote, only Krasner multiplied that effect by monopolizing the turnout.

The change in votes per mile from 2015 to 2017 illustrates that starkly.

View code
turnout_wide <- turnout_df %>%
  group_by() %>%
  mutate(
    votes_per_sf = total_votes / AREA_SFT,
    key = paste0("votes_", year)
  ) %>%
  select(WARD_DIVSN, key, votes_per_sf) %>%
  spread(key = key, value = votes_per_sf)

district_map +
  geom_polygon(
    data = ggdivs %>%
      filter(in_bbox) %>%
      left_join(turnout_wide),
    aes(
      fill = (votes_2017 - votes_2015)*5280^2
    )
  ) +
  scale_fill_gradient2(
    "Change in votes per mile\n  2015 - 2017",
    low=strong_orange,
    mid="white",
    high=strong_purple,
    midpoint=0
  ) +
  geom_polygon(
    fill=NA,
    color = "black",
    size=1
  )  +
  theme(legend.position = "bottom", legend.direction = "horizontal") +
  ggtitle(
    sprintf("Change in votes per mile, District %s", DISTRICT),
    "Orange: More votes in 2015, Purple: More in 2017"
  )

plot of chunk relative_turnout_15_17

To simplify the analysis, let’s divide the District into the two distinct coalitions: the Clinton/Hardy Williams “West & Southwest”, and the Krasner/Sanders “University City”. While they’re obvious on the map, we need a rule to split them up; ideally, there would be natural clusters to divide them. Using the simplistic division based on whether the average Krasner/Sanders vote was greater than 50% is surprisingly useful:

View code
district_categories <- candidate_votes %>% 
    filter(
      council_district == DISTRICT & 
        candidate_name %in% c("Larry Krasner", "Bernie Sanders")
    ) %>%
    group_by(WARD_DIVSN) %>%
    mutate(votes_2016 = total_votes[year == 2016]) %>%
    select(WARD_DIVSN, votes_2016, candidate_name, pvote) %>%
    spread(key=candidate_name, value=pvote)

ggplot(
  district_categories,
  aes(x = 100 * `Bernie Sanders`, y = 100 * `Larry Krasner`)
) +
  geom_point(aes(size = votes_2016), alpha = 0.7) +
  scale_size_area("Total Votes in 2016")+
  theme_sixtysix() +
  xlab("Percent of Vote for Bernie Sanders") +
  ylab("Percent of Vote for Larry Krasner") +
  coord_fixed() + 
  geom_abline(slope = -1, intercept =  100) +
  annotate(
    geom = "text",
    x = c(35, 20),
    y = c(15, 87),
    hjust = 0,
    label = c("West & Southwest", "University City"),
    color = c(strong_green, strong_purple),
    fontface="bold"
  ) +
  ggtitle("Divisions' vote", sprintf("District %s Democratic Primary", DISTRICT))

plot of chunk scatter_bernie_gym
We’ll call the divisions above the line University City, and those below the line West & Southwest.

Here’s the map of the cohorts that this categorization gives us.

View code
district_categories$category <- with(
  district_categories,
  (`Bernie Sanders` + `Larry Krasner`) > 1.0
)
district_categories$cat_name <- ifelse(
  district_categories$category,
  "University City",
  "West & Southwest"
)

district_map + 
  geom_polygon(
    data = ggdivs %>% 
      left_join(district_categories) %>% 
      filter(!is.na(cat_name)),
    aes(fill = cat_name)
  ) +
  scale_fill_manual(
    "",
    values = c("University City" = strong_purple, "West & Southwest" = strong_green)
  ) +
  ggtitle(sprintf("District %s neighborhood divisions", DISTRICT))

plot of chunk category_map
Looks reasonable.

How did the candidates do in each of the two sections? The boundary separates drastic performance splits.

View code
neighborhood_summary <- candidate_votes %>% 
  inner_join(candidates_to_compare) %>%
  group_by(candidate_name, election_name) %>%
  mutate(
    citywide_votes = sum(VOTES),
    citywide_pvote = 100 * sum(VOTES) / sum(total_votes)
  ) %>%
  filter(council_district == DISTRICT) %>%
  left_join(district_categories) %>%
  group_by(candidate_name, citywide_votes, citywide_pvote, election_name, cat_name) %>%
  summarise(
    votes = sum(VOTES),
    pvote = 100 * sum(VOTES) / sum(total_votes),
    total_votes = sum(total_votes)
  ) %>%
  group_by(candidate_name, election_name) %>%
  mutate(
    district_votes = sum(votes),
    district_pvote = 100 * sum(votes) / sum(total_votes)
  ) %>% select(
    election_name, candidate_name, citywide_pvote, district_pvote, cat_name, pvote, total_votes
  ) %>%
  gather(key="key", value="value", pvote, total_votes) %>%
  unite("key", cat_name, key) %>%
  spread(key, value)


neighborhood_summary %>%
  knitr::kable(
    digits=0, 
    format.args=list(big.mark=','),
    col.names=c("Election", "Candidate", "Citywide %", sprintf("District %s %%", DISTRICT), "University City %", "University City Turnout", "West & Southwest %", "West & Southwest Turnout")
  )

 

Election Candidate Citywide % District 3 % University City % University City Turnout West & Southwest % West & Southwest Turnout
2015 City Council Helen Gym 8 8 16 18,521 5 47,400
2015 City Council Isaiah Thomas 7 8 7 18,521 8 47,400
2015 Mayor Anthony Hardy Williams 26 48 24 5,738 55 19,335
2015 Mayor Jim Kenney 56 39 62 5,738 33 19,335
2016 President Bernie Sanders 37 39 59 12,376 30 27,991
2016 President Hillary Clinton 63 61 41 12,376 70 27,991
2017 District Attorney Larry Krasner 38 51 73 7,125 36 11,113
2017 District Attorney Tariq Karim El Shabazz 12 15 5 7,125 22 11,113

Gym won 16% in University City, but only 5% in West & Southwest; Thomas ran an even 7 and 8%, respectively. Kenney won 62% in University City and only 33% in West and Southwest, Hardy Williams flipped that for 24 and 55%. Krasner won an astounding 73% of the vote in University City (in a crowded race!), and only 36% in West and Southwest, though that was still good enough to win the neighborhood. El Shabazz won 5 and 22%.

Also, notice the dramatic change in relative turnout. In the 2015 Mayoral race, West & Southwest had 3.4 times the vote of University City. The dramatic turnout swing of 2017 shrunk that to 1.6. West and Southwest still hold most of the voters (among substantially more households), but the relative proportions needed shift.

The relative power of West and Southwest and University City

How much does the power shift between the two cohorts? Let’s do some math.

How much does a candidate need from each of the sections to win? Let t_i be the relative turnout in section i, defined as the proportion of total votes. So in the 2017 District Attorney Race, t_i was 0.39 for University City, and 0.61 for West & Southwest. Let p_ic be the proportion of the vote received by candidate c in section i, so in 2017, p is 0.73 for Krasner in University City.

Then a candidate wins a two-way race whenever the turnout-weighted proportion of their vote is greater than 0.5: sum_over_i(t_i p_ic) > 0.5.

Since we’ve divided District 3 into only 2 sections, we can plot this on a two-way plot. On the x-axis, let’s map a candidate’s percent of the vote in University City, and on the y, a candidate’s percent of the vote in West & Southwest (assuming a two-person race). The candidate wins whenever the average of their proportions, weighted by \(\tilde{t}\) is greater than 50%. If the turnout looks like 2015, West & Southwest easily carry the District; if it looks like 2017, the sections carry nearly equal weight. The dashed lines show the win boundaries; candidates to the top-right of the lines win.

I’ll plot only the two-candidate vote for the top two candidates in the district for each race, to emulate a two-person race. (For City Council in 2015, I use Helen Gym and Isaiah Thomas, who were 4th and 5th in the district, and 5th and 6th citywide.)

View code
get_line <- function(x_total_votes, y_total_votes){
  ## solve p_x t_x+ p_y t_y > 50
  tot <- x_total_votes + y_total_votes
  tx <- x_total_votes / tot
  ty <- y_total_votes / tot

  slope <- -tx / ty
  intercept <- 50 / ty  # use 50 since proportions are x100
  c(intercept, slope)
}

line_2017 <- with(
  neighborhood_summary,
  get_line(
    `University City_total_votes`[candidate_name == "Larry Krasner"],
    `West & Southwest_total_votes`[candidate_name == "Larry Krasner"]
  )
)

## get the two-candidate vote
neighborhood_summary <- neighborhood_summary %>%
  group_by(election_name)  %>% 
  mutate(
    ucity_pvote_2cand = `University City_pvote` / sum(`University City_pvote`),
    wsw_pvote_2cand = `West & Southwest_pvote`/sum(`West & Southwest_pvote`)
  )

line_2015 <- with(
  neighborhood_summary,
  get_line(
    `University City_total_votes`[candidate_name == "Jim Kenney"],
    `West & Southwest_total_votes`[candidate_name == "Jim Kenney"]
  )
)

library(ggrepel)

ggplot(
  neighborhood_summary,
  aes(
    x=100*ucity_pvote_2cand,
    y=100*wsw_pvote_2cand
  )
) +
  geom_point() +
  geom_text_repel(aes(label=candidate_name)) +
  geom_abline(
    intercept = c(line_2015[1], line_2017[1]),
    slope = c(line_2015[2], line_2017[2]),
    linetype="dashed"
  ) +
  coord_fixed() + 
  scale_x_continuous(
    "University City percent of vote",
    breaks = seq(0,100,10)
  ) +
  scale_y_continuous(
    "West & Southwest percent of vote",
    breaks = seq(0, 100, 10)
  ) +
  annotate(
    geom="text",
    label=paste(c(2015, 2017), "turnout"),
    x=c(10, 8),
    y=c(
      line_2015[1] + 10 * line_2015[2],
      line_2017[1] + 8 * line_2017[2]
    ),
    hjust=0,
    vjust=-0.2,
    angle = atan(c(line_2015[2], line_2017[2])) / pi * 180,
    color="grey40"
  )+
  annotate(
    geom="text",
    x = 70,
    y=75,
    label="Candidate wins",
    fontface="bold",
    color = strong_green
  ) +
  geom_hline(yintercept = 50, color="grey50") +
  geom_vline(xintercept = 50, color="grey50")+
  expand_limits(x=100, y=80)+
  theme_sixtysix() +
  ggtitle(
    "The relative strength of W & SW Philly and U City",
    "Candidates to the top-right of the lines win."
  )

plot of chunk win_scatter

Hillary Clinton and Larry Krasner won the district in a landslide, with Clinton winning despite losing University City to Sanders. Helen Gym and Jim Kenney were in the turnout-dependent zone: they would win the district if turnout looked like 2017, and lose it if turnout looked like 2015 (and vice versa for Hardy Williams and Thomas).

So could a candidate who monopolized University City win? Maybe, but it’s hard. If turnout looks like 2017, then a candidate who wins 70% of the University City vote still needs to win 37% of the West and Southwest vote. If the turnout looks like 2015, the required W/SW vote jumps to 44. Clinton and Krasner pulled off dominant victories that would win in any turnout climate; Hardy Williams, Kenney, El Shabazz, and Gym saw the neighborhoods’ turnouts be decisive.

Looking to May

I don’t know how Jamie Gauthier will fare in University City or in West & Southwest Philly, but my hunch is that she’s seeking the reformist, University City lane. But that’s a hard lane to win in. Even if she achieves Gym and Kenney percentages, she would need to additionally inspire turnout the way that Krasner did. Alternatively, she needs to pull enough support from West and Southwest; significantly more than Gym and Kenney did. It’s possible, but a steep climb.

What At Large City Councilors most polarized the vote?

May’s primary will include elections for Philadelphia City Council. The council is constituted of 17 councilors, ten of whom are voted in by specific districts and seven of whom are At Large, voted in by the city as a whole. Of those seven at large, only five can come from the same party. In practice means that five Democrats will win this primary, and then win landslide elections in November.

In advance of May, I’m going to be looking at what it takes to win a Democratic City Council At Large seat. Today, let’s look at how polarizing candidates are.

[Note: I’m starting today making my blog posts in RMarkdown. Click the View Code to see the R code!]

View code
## You can access the data at: 
## https://github.com/jtannen/jtannen.github.io/tree/master/data
# load("df_major_2017_12_01.Rda")

df_major$CANDIDATE <- gsub("\\s+", " ", df_major$CANDIDATE)
df_major$PARTY[df_major$PARTY == "DEMOCRATIC"] <- 'DEMOCRAT'

df_major <- df_major %>% 
  filter(
    election == "primary" &
      OFFICE == "COUNCIL AT LARGE" &
      PARTY %in% c("DEMOCRAT")
  )

df_total <- df_major %>% 
  group_by(CANDIDATE, year, PARTY) %>%
  summarise(votes = sum(VOTES)) %>%
  group_by(year, PARTY) %>%
  arrange(desc(votes)) %>%
  mutate(rank = rank(desc(votes)))

div_votes <- df_major %>%
  group_by(WARD16, DIV16, OFFICE, year) %>%
  summarise(div_votes = sum(VOTES))

Measuring Vote Polarization

One way to measure polarization is using the Gini coefficient, common in studying inequality. Suppose for each candidate we line up the precincts in order of their percent of the vote. We then move down the precincts, adding up the total voters and the votes for that candidate. We plot the curve, with the cumulative voters along the x axis, and the cumulative votes for that candidate along the y.

The curvature of that line is a measure of the inequality of the distribution of votes. In this case, I call that polarization. Suppose a candidate got 50% of the vote in every single precinct. Then the curve would just be a straight line with a slope of 0.5; there would be no polarization. Alternatively, if a candidate got zero of the votes from 90% of the precincts, but all of the vote in the remaining 10%, then the curve would be flat at 0 for the first 90% of the x-axis, but then bend and shoot up; a sharp curve and a lot of polarization.

View code
vote_cdf <- df_major %>%
  left_join(div_votes) %>%
  group_by(CANDIDATE, year) %>%
  mutate(
    p_vote_div = VOTES / div_votes,
    cand_vote_total = sum(VOTES)
  ) %>%
  arrange(p_vote_div) %>%
  mutate(
    cum_votes = cumsum(VOTES),
    vote_cdf = cum_votes / cand_vote_total,
    cum_denom = cumsum(div_votes) / sum(div_votes)
  ) 

ggplot(
  vote_cdf %>% 
    left_join(df_total) %>%
    filter(year == 2015 & rank <= 7),
  aes(x=cum_denom, y=cum_votes)
) + geom_line(
    aes(group=CANDIDATE, color=CANDIDATE),
    size=1
) +
  geom_text(
    data = vote_cdf %>% 
    left_join(df_total) %>%
    filter(year == 2015 & rank <= 7) %>%
      group_by(CANDIDATE) %>%
      filter(cum_votes == max(cum_votes)),
    aes(label = tolower(CANDIDATE)),
    x = 1.01,
    hjust = 0
  ) +
  xlab("Cumulative voters") +
  scale_y_continuous(
    "Cumulative votes for candidate",
    labels=scales::comma
  ) +
  scale_color_discrete(guide=FALSE)+
  expand_limits(x=1.3)+
  theme_sixtysix() +
  ggtitle(
    "Vote distributions for 2015 Council At Large",
    "Top seven finishers"
  )

plot of chunk gini

Above is that plot for the top seven At Large finishers in 2015 (remember that five Democrats can win). Helen Gym was the fifth. Interestingly, she also was the most polarizing: 49.4% of her votes came from her best 25% of divisions. For comparison, 38.3% of Derek Green’s votes came from his best 25% of divisions.

If we scale each candidate’s y-axis by their final total votes, the difference in curvature is even more stark.

View code
ggplot(
  vote_cdf %>% 
    left_join(df_total) %>%
    filter(year == 2015 & rank <= 7),
  aes(x=cum_denom, y=vote_cdf)
) + geom_line(
  aes(group=CANDIDATE, color=CANDIDATE),
  size=1
) +
  coord_fixed() +
  geom_abline(slope = 1, yintercept=0) +
  xlab("Cumulative voters") +
  ylab("Cumulative proportion of candidate's votes") +
  scale_color_discrete(guide = FALSE) +
  annotate(
    geom="text",
    y = c(0.45, 0.3),
    x = c(0.52, 0.6),
    hjust = c(1, 0),
    label = c("william k greenlee", "helen gym")
  ) +
  theme_sixtysix() +
    ggtitle(
    "Vote distributions for 2015 Council At Large",
    "Top seven finishers, scaled for total votes"
  )

plot of chunk gini_scaled

So Helen Gym snuck in four years ago, with a highly polarized vote. Is that common for new challengers? Not really. Usually, it’s hard to win without more even support.

To summarise the curvature into a single number, the Gini coefficient is defined as the area above the curve but below the 45 degree line, divided by the total area of the triangle. Notice that the more curved the line, the more area between the 45-degree line and the curve, and the higher the coefficient. If there is no inequality, the Gini coefficient is 0, if there’s complete inequality, it’s 1. Helen Gym’s Gini coefficient is 0.35, Bill Greenlee’s is 0.19.

Below I plot each candidate’s proportion of the vote on the x-axis (blue names are winners), and their Gini coefficient on the y-axis (higher values are more polarized).

View code
gini <- vote_cdf %>% 
  arrange(CANDIDATE, year, cum_denom) %>%
  group_by(CANDIDATE, year) %>%
  mutate(
    is_first = cum_denom == min(cum_denom),
    bin_width = cum_denom - ifelse(is_first, 0, lag(cum_denom)),
    avg_height = (vote_cdf + ifelse(is_first, 0, lag(vote_cdf)))/2,
    area = bin_width * avg_height
  ) %>% 
  summarise(
    gini = 1 - 2 * sum(area),
    total_votes = weighted.mean(p_vote_div, div_votes)
  )

ggplot(
  gini %>% left_join(df_total) %>% filter(rank <= 10), 
  aes(x=total_votes, y=gini)
) + 
  geom_text(
    aes(label=tolower(CANDIDATE), color=(rank<=5)),
    size = 3
  ) +
  scale_color_manual(
    "winner", 
    values=c(`TRUE` = strong_blue, `FALSE` = strong_red),
    guide = FALSE
  )+
  scale_x_continuous(
    "proportion of vote",
    expand=expand_scale(mult=0.2)
  ) +
  ylab("gini coefficient (higher means more polarization)")+
  facet_wrap(~year) +
  theme_sixtysix() +
  ggtitle("Total votes versus vote polarization",
          "Top ten finishers for City Council At Large. Winners in blue.")

plot of chunk gini_scatter

Helen Gym had the highest Gini coefficient of any winner in the last four elections, and no one else was close.

There are a few things going on here. First, the winners are usually incumbents, and incumbents probably benefit from name recognition across the city. All of the winners in 2011 were incumbents, for example.

But even the non-incumbents who won had more even support. Allan Domb had the second lowest gini coefficient in 2015, and Derek Green the third. Greenlee and Bill Green had the lowest Gini coefficients when they won as challengers in 2007 (Greenlee was technically an incumbent from a 2006 Special Election).

There are a few ways to view Helen Gym’s polarization. Remember that this is unrelated to total proportion of the vote; she won the fifth most votes, more than candidates who had even and low support across the city. She did so by particularly consolidating her neighborhoods, mobilizing the wealthier, whiter progressive wards that formed her coalition (presumably with the incumbency, she will receive broader support this time around).

View code
# library(sf)
# divs <- st_read("2016_Ward_Divisions.shp", quiet = TRUE)

gym_vote <- divs %>% 
  left_join(
    df_major %>% 
      filter(year == 2015) %>% 
      mutate(WARD_DIVSN = paste0(WARD16, DIV16)) %>% 
      group_by(WARD_DIVSN) %>% 
      mutate(p_vote = VOTES / sum(VOTES)) %>% 
      filter(CANDIDATE == "HELEN GYM")
    )

ggplot(gym_vote)+ 
  geom_sf(
    aes(fill = p_vote * 100),
    color = NA
  ) +
  theme_map_sixtysix() +
  scale_fill_viridis_c("% of Vote") +
  ggtitle(
    "Helen Gym's percent of the vote, 2015",
    "Voters could vote for up to five At Large candidates"
  )