Introducing PhilaStats, the new home of vital statistics for Philadelphia, part one

Jonathan’s note: Today, I’ve asked my friends Megan Todd and Annaka Scheeres of the Data Lab at the Philadelphia Department of Public Health‘s Division of Chronic Disease and Injury Prevention to contribute a guest post about their new vital statistics portal, PhilaStats. Check it out!

Guest post by Megan Todd and Annaka Scheeres from the Data Lab at the Philadelphia Department of Public Health, Division of Chronic Disease and Injury Prevention.

View code
### load required libraries
library(tidyverse)
library(magrittr)
library(extrafont)
library(ggiraph)


### load required data

## connect to base ArcGIS Rest API for citywide mortality metrics
death_url <- httr::parse_url("https://services.arcgis.com/fLeGjb7u4uXqeF9q/ArcGIS/rest/services/Vital_Mortality_Cty/FeatureServer/0/query")


## build query for ArcGIS Rest API
death_url$query <- list(where = "METRIC_NAME = 'count_deaths' OR METRIC_NAME = 'crude_years_of_potential_life_lost_to_age_75'",
                        # allows > 2,000 to be loaded
                        resultType = "standard",
                        # returns all fields
                        outFields = "*",
                        returnGeometry = FALSE,
                        # specifies format of data as JSON
                        f = "json")


## load and format data frame of citywide mortality metrics
death_metrics <- jsonlite::fromJSON(httr::build_url(death_url))$features$attributes %>% 
  janitor::clean_names() %>% 
  select(-objectid) %>% 
  mutate(metric_name = case_when(str_detect(metric_name,"crude") ~ "Years of potential life lost to age 75",
                                 metric_name == "count_deaths" ~ "Count"),
         quality_flag = replace_na(quality_flag,"None"),
         year = as.character(year))


### create functions for figures

## create general ggplot themes
theme_cdip_data_lab <- function() {
  theme_bw(base_family = "Open Sans") +
    theme(
      panel.grid.minor = element_blank(),
      panel.border = element_blank(),
      axis.title.x =  element_blank(),
      axis.title.y = element_text(size=16),
      axis.ticks = element_blank(),
      legend.title = element_blank(),
      legend.key = element_rect(colour = "transparent", fill = "white"),
      plot.title = element_text(size=15,face = "bold"),
      plot.caption = element_text(colour = "#7a8489"),
      axis.text = element_text(size=14),
      legend.text = element_text(size=14),
      strip.background = element_rect(color = "#999690", fill = "#999690"),
      strip.text = element_text(face = "bold", color = "#ffffff", size = 14),
      panel.spacing.y = unit(1.5,"lines")
    )
}


## create custom ggplotly theme
custom_ggplotly <- function(plot) {
  
  plot %>% 
    # set tooltip based on text defined in ggplot aes()
    ggplotly(tooltip = "text") %>%
    #change font/size formatting in tooltips
    layout(hoverlabel = list(font=list(family="Open Sans",size=16)),
           legend = list(title = ""),
           margin = list(t=80)) %>% 
    # remove extraneous plotly buttons
    config(modeBarButtonsToRemove = list("zoomIn2d","zoomOut2d","lasso2d","pan3d","pan2d","select2d","autoScale2d","toggleSpikelines","hoverClosestCartesian","hoverCompareCartesian"))
  
}


## function to format numbers nicely
comma <- function(x) format(x, big.mark = ",",scientific=FALSE)


## set leading cause colors for all plots
leading_cause_colors <- c("Heart\ndisease" = "#cc3000",
                          "Cancer" = "#3a833c",
                          "Cerebrovascular\ndiseases" = "#96c9ff",
                          "Chronic lower\nrespiratory diseases" = "#58c04d",
                          "Drug\noverdoses" = "#0f4d90",
                          "Unintentional\ninjuries" = "#fb9a99",
                          "Diabetes" = "#f3c613",
                          "Homicide" = "#f99300",
                          "Chronic kidney\ndisease" = "#9400c6",
                          "Septicemia" = "#2176d2",
                          "Flu &\npneumonia" = "#cab2d6",
                          "Suicide" = "#80899b",
                          "Perinatal period" = "#e7298a")


## function to make leading cause figures
leading_cause_figure <- function(metric_name_input,title_input) {
  
  ranked_death_df <- death_metrics %>%
    # filter data to all races/ethncities
    filter(race_ethnicity == "All races/ethnicities",
           # filter data to desired metric in 2019
           metric_name == metric_name_input,
           # filter data to all ages
           age_category == "All ages",
           # filter data to all sexes
           sex == "All sexes",
           # filter data to causes ranked in the top 10
           rank %in% 1:10) %>% 
    # reformat leading_cause_death text so labels display nicely on figure (i.e., insert line breaks)
    mutate(leading_cause_death = case_when(str_detect(leading_cause_death,"Unintentional") ~ "Unintentional\ninjuries",
                                           str_detect(leading_cause_death,"lower") ~ "Chronic lower\nrespiratory diseases",
                                           str_detect(leading_cause_death,"Cerebrovascular") ~ "Cerebrovascular\ndiseases",
                                           str_detect(leading_cause_death,"Drug") ~ "Drug\noverdoses",
                                           str_detect(leading_cause_death,"Heart") ~ "Heart\ndisease",
                                           str_detect(leading_cause_death,"Influenza") ~ "Flu &\npneumonia",
                                           str_detect(leading_cause_death,"Chronic kidney") ~ "Chronic kidney\ndisease",
                                           str_detect(leading_cause_death,"Intentional") ~ "Suicide",
                                           str_detect(leading_cause_death,"Certain") ~ "Perinatal period",
                                           TRUE ~ leading_cause_death))


  # expand table to include all possible combinations of variables -- needed to make sure lines don't connect across non-subsequent years
  # example: if cause is in top 10 for 2014 and 2016 but not 2015, this section adds in an NA value for 2015, which prevents the 2014 and 2016 points from connecting
  ranked_death_df %<>%
    expand(year,sex,race_ethnicity,age_category,leading_cause_death,metric_name) %>% 
    # re-join with original data frame to include missing values for years
    left_join(ranked_death_df)
  
  
  # filter to minimum year selected -- use for y-axis cause of death labels
  leading_cause_list <- ranked_death_df %>% 
    filter(year == min(year),
           !is.na(rank)) %>% 
    # sort leading causes by rank
    arrange(rank) %>% 
    # pull leading_cause_date as a vector
    pull(leading_cause_death)

  
  base_plot <- ranked_death_df %>% 
    # set year as x-axis variable
    ggplot(aes(x=year,
               # set rank as y-axis variable -- desc() sorts rank appropriately
               y=desc(rank),
               # set leading_cause_death as color
               color = leading_cause_death,
               # set leading_cause_death as group for line geometry
               group = leading_cause_death,
               # define text in tooltip
               tooltip=str_c("Year: ",year,
                             "\nCause of death: ",leading_cause_death,
                             "\nMetric: ",str_to_title(metric_name),
                             "\nValue: ",comma(round(metric_value,2)),
                             "\nQuality concerns: ",quality_flag))) +
    # use ggiraph function to make points interactive (i.e., hoverable)  
    geom_point_interactive(size = 12) +
    # specify line geometry
    geom_line(size = 2) +
    # manually set color for each leading cause of death
    scale_color_manual(values=leading_cause_colors) + 
    # apply pre-defined ggplot theme
    theme_cdip_data_lab() +
    # apply y-axis labels using leading_causes vector
    scale_y_continuous(labels = leading_cause_list,
                       breaks = c(-1:-10)) +
    # create rank # label for each point
    geom_text(aes(label=rank),color="white",size=5,family="Open Sans") +
    # apply additional thematic styling
    theme(panel.grid.major = element_blank(),
          panel.border = element_blank(),
          legend.position = "none",
          axis.title.y = element_blank(),
          axis.text.x = element_text(size=16,face="bold"),
          # color based on leading_cause_colors vector sorted by leading_cause_list
          axis.text.y = element_text(size=15,face="bold",color = leading_cause_colors[leading_cause_list]),
          plot.title = element_text(size=17),
          # use ggtext package to apply stylistic formatting to caption
          plot.caption = ggtext::element_markdown(colour = "black",size=14,margin=margin(t=12),lineheight = 1.5),
          plot.subtitle = element_text(colour = "black",size=15),
          plot.title.position = "plot") +
    #set dynamic figure title and figure subtitle
    labs(title = str_c("Ranked underlying causes of",title_input,"in Philadelphia, PA",sep = " "),
         subtitle = "All races/ethnicities, All sexes, All ages",
         caption = "<b>Source:</b> PA Vital Registration System<br><b>Notes:</b> NH = non-Hispanic; Unintentional injuries exclude drug overdoses")


  # convert ggplot object to girafe object
  girafe_plot <- girafe(ggobj = base_plot,
                        width_svg = 12,
                        height_svg = 6)


  # make tooltip color match color of corresponding point
  girafe_options(x = girafe_plot,
                 opts_tooltip(use_fill = TRUE,
                              css = "font-family: 'Open Sans';font-size: 14px;color:white;"))
  
}

The Philadelphia Department of Public Health recently launched a new resource for health information on the residents of Philadelphia. PhilaStats is an interactive dashboard that highlights statistics and trends in population, mortality (deaths), and natality (births) for Philadelphia residents over the past decade. The most recent year of data is 2019; we expect the state to release final numbers for 2020 sometime this summer and will update PhilaStats at that time.

What are vital statistics and why should we care?

The Commonwealth of Pennsylvania is required by law to register vital events – that is, births and deaths – and report statistics on these events to the federal government. Vital statistics are the most complete data available to public health officials, and they provide crucial insights into trends in population health, like: birth and death rates, progress toward reducing deaths from specific causes like cancer or homicide, and the proportion of births that are preterm.

What questions can be answered by the data on PhilaStats?

PhilaStats can help you answer many questions about population health, including:

  • What causes of death are responsible for mortality at younger ages?
  • Does life expectancy at birth differ by sex and race/ethnicity?
  • Have we made progress toward reducing the number of teen parents?
  • Is mortality from diabetes more common in areas of the city where fewer residents have health insurance?

In this series of posts, we’ll explore the first two questions.

What kills young people in Philadelphia?

PhilaStats allows users to explore leading causes of death in Philadelphia via the mortality tab. Box C shows leading causes of death in each year from 2012 to 2019. What are the most common causes of death?

View code
lcf_count <- leading_cause_figure(metric_name_input = "Count",
                     title_input = "death")

Click here to expand and visit PhilaStats to customize.

The most common causes of death in 2019 were heart disease, cancer, drug overdoses, cerebrovascular diseases (stroke), and chronic lower respiratory diseases. Unintentional injuries, septicemia, chronic kidney disease, homicide, and diabetes round out the ten leading causes of death. Most of these causes of death are diseases that occur at older ages, since most Philadelphians die as older adults.

To see what is driving mortality in younger Philadelphians, we can switch the metric from causes of death to causes of premature death. Years of potential life lost (YPLL) measures exactly this: it estimates the total number of years of life lost by those who died prematurely, which we define here as before age 75. If someone died at 60, they would contribute 15 years to YPLL, and if someone died at 30, they would contribute 45 years. YPLL is calculated as a sum across all people who died prematurely, representing all the person-years of life lost before age 75. Thus, YPLL gives more weight to causes of death that occur at younger ages than to causes that occur at older ages.

View code
lcf_years <- leading_cause_figure(metric_name_input = "Years of potential life lost to age 75",
                     title_input = "years of potential life lost to age 75")

Click here to expand and visit PhilaStats to customize.

The list of leading causes of premature death looks different from the list of leading causes of death; notably, external causes of death jump up in the ranking. In 2019, drug overdoses were the biggest contributor to YPLL, with homicide coming in fourth (behind heart disease and cancer). Certain conditions originating in the perinatal period (after 28 weeks gestation or within 7 days of birth) are the sixth leading contributor to YPLL, even though deaths due to these conditions are relatively rare. This is because deaths caused by perinatal conditions – like injuries or infections during the birthing process – occur early in life by definition, so a single death during the perinatal period contributes nearly 75 years to the metric YPLL before age 75.

This figure can be further customized to show leading causes of death and YPLL among specific demographic groups: by sex, race/ethnicity, and age group. How do leading causes of death differ between Hispanic men and non-Hispanic Black men? Do non-Hispanic Asian women die of the same causes of death as non-Hispanic white women? Explore the mortality tab of PhilaStats to find out.

How does life expectancy at birth differ by sex and race/ethnicity? In the next post in this series, we’ll use PhilaStats to explore this question.

We would love to hear how you’re using PhilaStats. You can reach the Data Lab by email, or find Megan, the Director of the Data Lab, on Twitter.