Introducing PhilaStats, the new home of vital statistics for Philadelphia, part two

Jonathan’s note: This is Part Two of a series about the new vital statistics portal PhilaStats, provided by Megan Todd and Annaka Scheeres of the Data Lab at the Philadelphia Department of Public Health‘s Division of Chronic Disease and Injury Prevention. Read Part One here.

Guest post by Megan Todd and Annaka Scheeres from the Data Lab at the Philadelphia Department of Public Health, Division of Chronic Disease and Injury Prevention

View code
### load required libraries
library(tidyverse)
library(magrittr)
library(extrafont)
library(rphl)
library(ggiraph)


### load required data

## connect to base ArcGIS Rest API for citywide mortality metrics
death_url <- httr::parse_url("https://services.arcgis.com/fLeGjb7u4uXqeF9q/ArcGIS/rest/services/Vital_Mortality_Cty/FeatureServer/0/query")


## build query for ArcGIS Rest API
death_url$query <- list(where = "METRIC_NAME = 'life_expectancy' AND AGE_CATEGORY = 'At birth'",
                        # allows > 2,000 records to be loaded
                        resultType = "standard",
                        # returns all fields
                        outFields = "*",
                        returnGeometry = FALSE,
                        # specifies format of data as JSON
                        f = "json")


## load and format data frame of citywide mortality metrics
death_metrics <- jsonlite::fromJSON(httr::build_url(death_url))$features$attributes %>% 
  janitor::clean_names() %>% 
  select(-c(objectid,rank)) %>% 
  mutate(metric_name = "Life expectancy at birth",
         quality_flag = case_when(race_ethnicity %in% c("Asian (NH)","Hispanic") ~ "Some concerns due to small population size",
                                  TRUE ~ "None"),
         year = as.character(year))


### create functions for figures

## create general ggplot themes
theme_cdip_data_lab <- function() {
  theme_bw(base_family = "Open Sans") +
    theme(
      panel.grid.minor = element_blank(),
      panel.border = element_blank(),
      axis.title.x =  element_blank(),
      axis.title.y = element_text(size=16),
      axis.ticks = element_blank(),
      legend.title = element_blank(),
      legend.key = element_rect(colour = "transparent", fill = "white"),
      plot.title = element_text(size=15,face = "bold"),
      plot.caption = element_text(size=13,colour = "#7a8489"),
      axis.text = element_text(size=14),
      legend.text = element_text(size=14),
      strip.background = element_rect(color = "#999690", fill = "#999690"),
      strip.text = element_text(face = "bold", color = "#ffffff", size = 14),
      panel.spacing.y = unit(1.5,"lines")
    )
}


## function to format numbers nicely
comma <- function(x) format(x, big.mark = ",",scientific=FALSE)


## set race/ethnicity colors for all plots
race_eth_colors <- c("All races/ethnicities"="#2176d2",
                     "Asian (NH)"="#96c9ff",
                     "Black (NH)"= "#f99300",
                     "Hispanic"="#58c04d",
                     "White (NH)"="#f3c613")


## set sex colors for all plots
sex_colors <- c("All sexes"="#2176d2",
                "Female"="#f3c613",
                "Male"="#96c9ff")


## function to make life expectancy figures
e_0_figure <- function(race_eth_input,fill_variable,fill_colors) {
  
  base_plot <- death_metrics %>%
    # filter data to life expectancy metric in 2019
    filter(race_ethnicity %in% race_eth_input,
           year == 2019,
           metric_name == "Life expectancy at birth") %>% 
    # set race_ethnicity as x-axis variable
    ggplot(aes(x=race_ethnicity,
               # set metric_value (i.e., life expectancy) as y-axis variable
               y=metric_value,
               # pipe in desired fill variable (sex or race_ethnicity)
               fill = {{fill_variable}},
               # define text in tooltip
               tooltip = str_c("Value: ",comma(round(metric_value,2)),
                               "\nMetric: ",metric_name,
                               "\nYear: ",year,
                               "\nRace/ethnicity: ",race_ethnicity,
                               "\nQuality concerns: ",quality_flag))) +
    # manually set color scheme
    scale_fill_manual(values=fill_colors) +
    # specify column geometry
    geom_col_interactive() +
    # facet by sex
    facet_wrap(~sex) +
    # apply pre-defined ggplot theme
    theme_cdip_data_lab() +
    # set figure and axis titles
    labs(title = "Life expectancy at birth in Philadelphia, PA (2019)",
         caption = "Source: PA Vital Registration System; Notes: NH = non-Hispanic",
         y = "Years")
  
  
  # if race/ethnicity input is longer than 1, set x-axis labels to 45 degree angle
  if (length(race_eth_input) > 1) {
    
    base_plot <- base_plot +
      guides(x = guide_axis(angle = 56))
    
  } else {base_plot}
  
  
  # convert ggplot object to girafe object
  girafe_plot <- girafe(ggobj = base_plot,
                        width_svg = 12,
                        height_svg = 6)


  # make tooltip color match color of corresponding point
  girafe_options(x = girafe_plot,
                 opts_tooltip(use_fill = TRUE,
                              css = "font-family: 'Open Sans';font-size: 14px;color:white;"))
  
}

The Philadelphia Department of Public Health recently launched a new resource for health information on the residents of Philadelphia. PhilaStats is an interactive dashboard that highlights statistics and trends in population, mortality (deaths), and natality (births) for Philadelphia residents over the past decade. The most recent year of data is 2019; we expect the state to release final numbers for 2020 sometime this summer and will update PhilaStats at that time.

In part two of this series, we will explore life expectancy at birth in Philadelphia. Missed part one? Catch up here.

How does life expectancy at birth differ by sex and race/ethnicity?

Life expectancy at birth is typically estimated in a given year as the number of years a person born in that year would live, assuming mortality rates remain constant throughout that person’s life. This assumption is false, of course; a person born in 2022 would not experience the same mortality rates at age 65 as those who were 65 in 2022. Instead, they would experience mortality rates at age 65 in 2087, which we might expect to be lower than the 2022 rates given recent trends in mortality reduction over time.

Because of this assumption, life expectancy is sensitive to mortality shocks. For example, life expectancy plummeted during the 1918 influenza pandemic; however, life expectancy was calculated for a hypothetical person who would live every year of their life in 1918, which very few people actually experienced. The same is true for the declines in life expectancy we expect to see during the COVID-19 pandemic when 2020 data are released: most people (except for infants who die the same year they’re born) will not actually live their entire life subject to these elevated mortality rates.

View code
e0_sex <- e_0_figure(race_eth_input = "All races/ethnicities", 
           fill_variable = sex, 
           fill_colors = sex_colors)

Click here to expand and visit PhilaStats to customize.

Life expectancy in Philadelphia was stable at around 76 years between 2012 and 2019. Mirroring a trend we see pretty much everywhere in the world, life expectancy in 2019 was higher for women (80.1 years) than it was for men (73.0 years). Life expectancy in Philadelphia was lower than in the US as a whole – only by a few months for women, but by more than three years for men.

View code
e0_race <- e_0_figure(race_eth_input = c("All races/ethnicities","Asian (NH)","Black (NH)","Hispanic","White (NH)"), 
           fill_variable = race_ethnicity, 
           fill_colors = race_eth_colors)

Click here to expand and visit PhilaStats to customize.

In addition to differences by sex, there was substantial variation in life expectancy by race/ethnicity as well. Of any sex and race/ethnicity combination, life expectancy in 2019 was lowest for non-Hispanic Black men at just 69.0 years. In comparison, life expectancy for non-Hispanic white men and Hispanic men was 74.8 years and 75.8 years, respectively. There was a similar trend among women in 2019: non-Hispanic Black women had a life expectancy of 77.4 years, compared to 81.0 years for non-Hispanic white women and 85.2 years for Hispanic women.

I haven’t yet mentioned life expectancy among non-Hispanic Asian groups in Philadelphia. You’ll see from the above graph that non-Hispanic Asian men and women have the highest life expectancy of any race/ethnicity group, at 90.7 years for both women and men. Interestingly, there’s no sex difference! This is unusual. Even more striking is the size of the non-Hispanic Asian advantage in life expectancy. Life expectancy for non-Hispanic Asian women is about 10.5 years higher than life expectancy for all women in Philadelphia. Life expectancy for non-Hispanic Asian men is nearly 18 years higher than it is for all men.

What could be accounting for the large non-Hispanic Asian life expectancy advantage we see in Philadelphia’s vital statistics? Is this difference real? There are two possibilities: either vital statistics are accurately capturing longer lives lived by non-Hispanic Asian Philadelphians compared to the rest of the population; or there could be problems in our data that suggest an advantage that doesn’t exist or overestimate the size of the advantage.

We have several reasons to believe that non-Hispanic Asian Philadelphians truly do live longer than the rest of the population. This pattern – albeit at a smaller scale – has been found for the US as a whole. Research has found that Asian American populations have lower mortality rates than other race/ethnicity groups in the US, no matter what specific cause of death is considered. Another stream of research has found that foreign-born Americans live longer than US-born Americans, and Asian-born Americans have exceptionally high longevity.

Are non-Hispanic Asian Philadelphians were more likely that other Philadelphians to be foreign-born? If so, this could contribute to the life expectancy advantage. This is a question we can answer relatively easily using Kyle Walker’s brilliant R package tidycensus.

View code
### Step 1: Set up for accessing data ----------------------------- 

## load required packages for accessing and analyzing Census data 
library(tidycensus) 
library(srvyr) 


## Set Census API key (can be obtained from http://api.census.gov/data/key_signup.html) 
# census_api_key("enter API key here", install = T) 


## pums_variables: searchable data catalog built into tidycensus 
pums_vars <- pums_variables %>%
  filter(year == 2019, survey == "acs5") %>%
  distinct(var_code, var_label, data_type, level) %>%
  filter(level == "person", !str_detect(var_label, "allocation flag"))


## list of Philadelphia pumas 
phila_pumas <- c("03201", "03202", "03203", "03204", "03205", "03206", "03207", "03208","03209", "03210", "03211") 


### Step 2: Importing data from tidycensus::get_pums() --------------------- 
if(file.exists("cached/phila_pums_2019.RDS")){
  phila_pums_2019 <- readRDS("cached/phila_pums_2019.RDS")
} else {
phila_pums_2019 <- get_pums(
  state = "PA", 
  puma = phila_pumas, 
  survey = "acs5", 
  year = 2019,
  recode = TRUE, # note: not available for 2020 yet 
  # select desired variables to include 
  variables = c("RAC1P","HISP","NATIVITY"),
  # load person level weights for weighted survey object 
  rep_weights = "person") %>% 
  # recode NATIVITIY variable into binary native and foreign born variables
  mutate(native_born = NATIVITY == 1, 
         foreign_born = NATIVITY == 2, 
         # create recoded race/ethnicity field based on RAC1P and HISP variables
         race_ethnicity = factor(
           case_when(HISP != "01" ~ "Hispanic",
                     RAC1P == 6 & HISP == "01" ~ "Asian, non-Hispanic",
                     RAC1P == 2 & HISP == "01" ~ "Black, non-Hispanic",
                     RAC1P == 1 & HISP == "01" ~ "White, non-Hispanic",
                     TRUE ~ "Other, non-Hispanic")
           ),
         # rearrange levels of factor so "Other, non-Hispanic" is last
         race_ethnicity = fct_relevel(race_ethnicity,"Other, non-Hispanic",after=4L))
}


### Step 3 - Create weighted survey object ---------------------------- 

## create weighted survey object with person-level weights
w_phila_pums_2019 <- to_survey(phila_pums_2019, type = "person") 
### Step 4 - Generate table output ---------------------------- 

## create formatted output HTML table 
w_phila_pums_2019 %>% 
  # group by race and Hispanic ethnicity variables 
  group_by(race_ethnicity) %>% 
  # select native_born and foreign_born columns to summarise
  summarise(across(ends_with("_born"),
                   # create list of functions to apply to selected columns: survey mean and total
                   # vartype = NULL excludes uncertainty estimates from output 
                   .fns = list(p = ~survey_mean(.x, vartype = NULL),
                               t = ~survey_total(.x, vartype = NULL)))) %>% 
  # add total population summary row
  bind_rows(w_phila_pums_2019 %>% 
              # select native_born and foreign_born columns to summarise
              summarise(across(ends_with("_born"),
                   # create list of functions to apply to selected columns: survey mean and total
                   # vartype = NULL excludes uncertainty estimates from output
                   .fns = list(p = ~survey_mean(.x, vartype = NULL),
                               t = ~survey_total(.x, vartype = NULL)))) %>% 
              mutate(race_ethnicity = "Total Population")) %>% 
  # format %s and totals nicely
  mutate(across(ends_with("_p"),~str_c(round(.x*100),"%")),
         across(ends_with("_t"),~comma(.x))) %>% 
  # convert table to HTML format 
  knitr::kable(format = "html",
               # clean up column names 
               col.names = c("Race/Ethnicity", 
                             "% Native Born",
                             "# Native Born", 
                             "% Foreign Born", 
                             "# Foreign Born"),
               caption = "Nativity by Race/Ethnicity in Philadelphia, PA (2019)") %>% 
  # add footnote with data source to table
  kableExtra::footnote(general = "Source: American Community Survey Public Use Microdata Series") %>% 
  # add nice styling to HTML table 
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                            full_width = T) 
Nativity by Race/Ethnicity in Philadelphia, PA (2019)
Race/Ethnicity % Native Born # Native Born % Foreign Born # Foreign Born
Asian, non-Hispanic 34% 38,492 66% 75,241
Black, non-Hispanic 93% 595,028 7% 47,731
Hispanic 81% 187,232 19% 45,000
White, non-Hispanic 91% 495,663 9% 48,834
Other, non-Hispanic 85% 39,645 15% 6,967
Total Population 86% 1,356,060 14% 223,773
Note:
Source: American Community Survey Public Use Microdata Series

Click here to expand.

The table above shows that 66% of non-Hispanic Asian Philadelphians were born outside of the US, compared to only 14% for the city as a whole. So, non-Hispanic Asian Philadelphians are disproportionately foreign-born, which may explain why life expectancy for this group is so much higher than any other race/ethnicity group.

Okay, so we hope we’ve convinced you that non-Hispanic Asian Philadelphians really do have a life expectancy advantage over other race/ethnicity groups. But the size of the advantage might be overestimated because of inaccuracies in the data.

One documented problem is that race/ethnicity is inconsistency recorded, especially for Asian Americans. Estimates of mortality, which feed into life expectancy calculations, typically use data from two sources: the race/ethnicity recorded on death certificates is used for the numerator (number of deaths) and the race/ethnicity from population estimates like the Census is used for the denominator (number of people). When race/ethnicity is recorded in the Census, it’s almost always reported by the person themselves, or by another member of the household. Race/ethnicity on a death certificate, however, is more likely to be reported by someone who may not know the person’s race/ethnicity, such as a physician or medical examiner.

Studies have compared how the race/ethnicity of specific individuals is recorded in various data sources over time to assess the consistency and accuracy of these data sources. For example, a person’s race/ethnicity may be recorded in one way in a decennial Census (when race/ethnicity is reported either by themselves or a household member), and another way on their death certificate (which may be reported by someone who does not know the person’s race/ethnicity). These studies have found that race/ethnicity is reported fairly accurately on the death certificate – meaning, the same race/ethnicity is identified on the death certificate as was self-selected on an earlier Census form or survey – only for non-Hispanic white and non-Hispanic Black Americans. Hispanic Americans, Asian Americans, and especially American Indians and Alaska Natives (AIAN) were underreported on death certificates. That is, someone who self-identified as Asian was more likely to be misclassified as some other race/ethnicity on their death certificate.

If death certificates underestimate the number of Asian Americans (or Asian Philadelphians) who die, this means the mortality rate for Asian Americans is underestimated, and thus life expectancy for Asian Americans is overestimated. The same is true for Hispanic Americans and AIAN Americans. When you see standard life expectancy estimates based on data from death certificates and the Census for Asian, Hispanic, and AIAN populations in the US, they are likely too high. We don’t know exactly how elevated these estimates are, but some recent research suggests the overestimation is significant in statistical terms.

So our conclusion about non-Hispanic Asian life expectancy in Philadelphia is the following. Non-Hispanic Asian Philadelphians live quite a bit longer than non-Asian Philadelphians. This is likely due to the fact that Non-Hispanic Asian Philadelphians are disproportionately foreign-born compared to other groups. However, these data are not perfect, and we think the true life expectancy advantage enjoyed by non-Hispanic Asian Philadelphians may be slightly smaller than what we see on PhilaStats.

Thanks for exploring PhilaStats with us!

We would love to hear how you’re using PhilaStats. You can reach the Data Lab by email, or find Megan, the Director of the Data Lab, on Twitter.

Introducing PhilaStats, the new home of vital statistics for Philadelphia, part one

Jonathan’s note: Today, I’ve asked my friends Megan Todd and Annaka Scheeres of the Data Lab at the Philadelphia Department of Public Health‘s Division of Chronic Disease and Injury Prevention to contribute a guest post about their new vital statistics portal, PhilaStats. Check it out!

Guest post by Megan Todd and Annaka Scheeres from the Data Lab at the Philadelphia Department of Public Health, Division of Chronic Disease and Injury Prevention.

View code
### load required libraries
library(tidyverse)
library(magrittr)
library(extrafont)
library(ggiraph)


### load required data

## connect to base ArcGIS Rest API for citywide mortality metrics
death_url <- httr::parse_url("https://services.arcgis.com/fLeGjb7u4uXqeF9q/ArcGIS/rest/services/Vital_Mortality_Cty/FeatureServer/0/query")


## build query for ArcGIS Rest API
death_url$query <- list(where = "METRIC_NAME = 'count_deaths' OR METRIC_NAME = 'crude_years_of_potential_life_lost_to_age_75'",
                        # allows > 2,000 to be loaded
                        resultType = "standard",
                        # returns all fields
                        outFields = "*",
                        returnGeometry = FALSE,
                        # specifies format of data as JSON
                        f = "json")


## load and format data frame of citywide mortality metrics
death_metrics <- jsonlite::fromJSON(httr::build_url(death_url))$features$attributes %>% 
  janitor::clean_names() %>% 
  select(-objectid) %>% 
  mutate(metric_name = case_when(str_detect(metric_name,"crude") ~ "Years of potential life lost to age 75",
                                 metric_name == "count_deaths" ~ "Count"),
         quality_flag = replace_na(quality_flag,"None"),
         year = as.character(year))


### create functions for figures

## create general ggplot themes
theme_cdip_data_lab <- function() {
  theme_bw(base_family = "Open Sans") +
    theme(
      panel.grid.minor = element_blank(),
      panel.border = element_blank(),
      axis.title.x =  element_blank(),
      axis.title.y = element_text(size=16),
      axis.ticks = element_blank(),
      legend.title = element_blank(),
      legend.key = element_rect(colour = "transparent", fill = "white"),
      plot.title = element_text(size=15,face = "bold"),
      plot.caption = element_text(colour = "#7a8489"),
      axis.text = element_text(size=14),
      legend.text = element_text(size=14),
      strip.background = element_rect(color = "#999690", fill = "#999690"),
      strip.text = element_text(face = "bold", color = "#ffffff", size = 14),
      panel.spacing.y = unit(1.5,"lines")
    )
}


## create custom ggplotly theme
custom_ggplotly <- function(plot) {
  
  plot %>% 
    # set tooltip based on text defined in ggplot aes()
    ggplotly(tooltip = "text") %>%
    #change font/size formatting in tooltips
    layout(hoverlabel = list(font=list(family="Open Sans",size=16)),
           legend = list(title = ""),
           margin = list(t=80)) %>% 
    # remove extraneous plotly buttons
    config(modeBarButtonsToRemove = list("zoomIn2d","zoomOut2d","lasso2d","pan3d","pan2d","select2d","autoScale2d","toggleSpikelines","hoverClosestCartesian","hoverCompareCartesian"))
  
}


## function to format numbers nicely
comma <- function(x) format(x, big.mark = ",",scientific=FALSE)


## set leading cause colors for all plots
leading_cause_colors <- c("Heart\ndisease" = "#cc3000",
                          "Cancer" = "#3a833c",
                          "Cerebrovascular\ndiseases" = "#96c9ff",
                          "Chronic lower\nrespiratory diseases" = "#58c04d",
                          "Drug\noverdoses" = "#0f4d90",
                          "Unintentional\ninjuries" = "#fb9a99",
                          "Diabetes" = "#f3c613",
                          "Homicide" = "#f99300",
                          "Chronic kidney\ndisease" = "#9400c6",
                          "Septicemia" = "#2176d2",
                          "Flu &\npneumonia" = "#cab2d6",
                          "Suicide" = "#80899b",
                          "Perinatal period" = "#e7298a")


## function to make leading cause figures
leading_cause_figure <- function(metric_name_input,title_input) {
  
  ranked_death_df <- death_metrics %>%
    # filter data to all races/ethncities
    filter(race_ethnicity == "All races/ethnicities",
           # filter data to desired metric in 2019
           metric_name == metric_name_input,
           # filter data to all ages
           age_category == "All ages",
           # filter data to all sexes
           sex == "All sexes",
           # filter data to causes ranked in the top 10
           rank %in% 1:10) %>% 
    # reformat leading_cause_death text so labels display nicely on figure (i.e., insert line breaks)
    mutate(leading_cause_death = case_when(str_detect(leading_cause_death,"Unintentional") ~ "Unintentional\ninjuries",
                                           str_detect(leading_cause_death,"lower") ~ "Chronic lower\nrespiratory diseases",
                                           str_detect(leading_cause_death,"Cerebrovascular") ~ "Cerebrovascular\ndiseases",
                                           str_detect(leading_cause_death,"Drug") ~ "Drug\noverdoses",
                                           str_detect(leading_cause_death,"Heart") ~ "Heart\ndisease",
                                           str_detect(leading_cause_death,"Influenza") ~ "Flu &\npneumonia",
                                           str_detect(leading_cause_death,"Chronic kidney") ~ "Chronic kidney\ndisease",
                                           str_detect(leading_cause_death,"Intentional") ~ "Suicide",
                                           str_detect(leading_cause_death,"Certain") ~ "Perinatal period",
                                           TRUE ~ leading_cause_death))


  # expand table to include all possible combinations of variables -- needed to make sure lines don't connect across non-subsequent years
  # example: if cause is in top 10 for 2014 and 2016 but not 2015, this section adds in an NA value for 2015, which prevents the 2014 and 2016 points from connecting
  ranked_death_df %<>%
    expand(year,sex,race_ethnicity,age_category,leading_cause_death,metric_name) %>% 
    # re-join with original data frame to include missing values for years
    left_join(ranked_death_df)
  
  
  # filter to minimum year selected -- use for y-axis cause of death labels
  leading_cause_list <- ranked_death_df %>% 
    filter(year == min(year),
           !is.na(rank)) %>% 
    # sort leading causes by rank
    arrange(rank) %>% 
    # pull leading_cause_date as a vector
    pull(leading_cause_death)

  
  base_plot <- ranked_death_df %>% 
    # set year as x-axis variable
    ggplot(aes(x=year,
               # set rank as y-axis variable -- desc() sorts rank appropriately
               y=desc(rank),
               # set leading_cause_death as color
               color = leading_cause_death,
               # set leading_cause_death as group for line geometry
               group = leading_cause_death,
               # define text in tooltip
               tooltip=str_c("Year: ",year,
                             "\nCause of death: ",leading_cause_death,
                             "\nMetric: ",str_to_title(metric_name),
                             "\nValue: ",comma(round(metric_value,2)),
                             "\nQuality concerns: ",quality_flag))) +
    # use ggiraph function to make points interactive (i.e., hoverable)  
    geom_point_interactive(size = 12) +
    # specify line geometry
    geom_line(size = 2) +
    # manually set color for each leading cause of death
    scale_color_manual(values=leading_cause_colors) + 
    # apply pre-defined ggplot theme
    theme_cdip_data_lab() +
    # apply y-axis labels using leading_causes vector
    scale_y_continuous(labels = leading_cause_list,
                       breaks = c(-1:-10)) +
    # create rank # label for each point
    geom_text(aes(label=rank),color="white",size=5,family="Open Sans") +
    # apply additional thematic styling
    theme(panel.grid.major = element_blank(),
          panel.border = element_blank(),
          legend.position = "none",
          axis.title.y = element_blank(),
          axis.text.x = element_text(size=16,face="bold"),
          # color based on leading_cause_colors vector sorted by leading_cause_list
          axis.text.y = element_text(size=15,face="bold",color = leading_cause_colors[leading_cause_list]),
          plot.title = element_text(size=17),
          # use ggtext package to apply stylistic formatting to caption
          plot.caption = ggtext::element_markdown(colour = "black",size=14,margin=margin(t=12),lineheight = 1.5),
          plot.subtitle = element_text(colour = "black",size=15),
          plot.title.position = "plot") +
    #set dynamic figure title and figure subtitle
    labs(title = str_c("Ranked underlying causes of",title_input,"in Philadelphia, PA",sep = " "),
         subtitle = "All races/ethnicities, All sexes, All ages",
         caption = "<b>Source:</b> PA Vital Registration System<br><b>Notes:</b> NH = non-Hispanic; Unintentional injuries exclude drug overdoses")


  # convert ggplot object to girafe object
  girafe_plot <- girafe(ggobj = base_plot,
                        width_svg = 12,
                        height_svg = 6)


  # make tooltip color match color of corresponding point
  girafe_options(x = girafe_plot,
                 opts_tooltip(use_fill = TRUE,
                              css = "font-family: 'Open Sans';font-size: 14px;color:white;"))
  
}

The Philadelphia Department of Public Health recently launched a new resource for health information on the residents of Philadelphia. PhilaStats is an interactive dashboard that highlights statistics and trends in population, mortality (deaths), and natality (births) for Philadelphia residents over the past decade. The most recent year of data is 2019; we expect the state to release final numbers for 2020 sometime this summer and will update PhilaStats at that time.

What are vital statistics and why should we care?

The Commonwealth of Pennsylvania is required by law to register vital events – that is, births and deaths – and report statistics on these events to the federal government. Vital statistics are the most complete data available to public health officials, and they provide crucial insights into trends in population health, like: birth and death rates, progress toward reducing deaths from specific causes like cancer or homicide, and the proportion of births that are preterm.

What questions can be answered by the data on PhilaStats?

PhilaStats can help you answer many questions about population health, including:

  • What causes of death are responsible for mortality at younger ages?
  • Does life expectancy at birth differ by sex and race/ethnicity?
  • Have we made progress toward reducing the number of teen parents?
  • Is mortality from diabetes more common in areas of the city where fewer residents have health insurance?

In this series of posts, we’ll explore the first two questions.

What kills young people in Philadelphia?

PhilaStats allows users to explore leading causes of death in Philadelphia via the mortality tab. Box C shows leading causes of death in each year from 2012 to 2019. What are the most common causes of death?

View code
lcf_count <- leading_cause_figure(metric_name_input = "Count",
                     title_input = "death")

Click here to expand and visit PhilaStats to customize.

The most common causes of death in 2019 were heart disease, cancer, drug overdoses, cerebrovascular diseases (stroke), and chronic lower respiratory diseases. Unintentional injuries, septicemia, chronic kidney disease, homicide, and diabetes round out the ten leading causes of death. Most of these causes of death are diseases that occur at older ages, since most Philadelphians die as older adults.

To see what is driving mortality in younger Philadelphians, we can switch the metric from causes of death to causes of premature death. Years of potential life lost (YPLL) measures exactly this: it estimates the total number of years of life lost by those who died prematurely, which we define here as before age 75. If someone died at 60, they would contribute 15 years to YPLL, and if someone died at 30, they would contribute 45 years. YPLL is calculated as a sum across all people who died prematurely, representing all the person-years of life lost before age 75. Thus, YPLL gives more weight to causes of death that occur at younger ages than to causes that occur at older ages.

View code
lcf_years <- leading_cause_figure(metric_name_input = "Years of potential life lost to age 75",
                     title_input = "years of potential life lost to age 75")

Click here to expand and visit PhilaStats to customize.

The list of leading causes of premature death looks different from the list of leading causes of death; notably, external causes of death jump up in the ranking. In 2019, drug overdoses were the biggest contributor to YPLL, with homicide coming in fourth (behind heart disease and cancer). Certain conditions originating in the perinatal period (after 28 weeks gestation or within 7 days of birth) are the sixth leading contributor to YPLL, even though deaths due to these conditions are relatively rare. This is because deaths caused by perinatal conditions – like injuries or infections during the birthing process – occur early in life by definition, so a single death during the perinatal period contributes nearly 75 years to the metric YPLL before age 75.

This figure can be further customized to show leading causes of death and YPLL among specific demographic groups: by sex, race/ethnicity, and age group. How do leading causes of death differ between Hispanic men and non-Hispanic Black men? Do non-Hispanic Asian women die of the same causes of death as non-Hispanic white women? Explore the mortality tab of PhilaStats to find out.

How does life expectancy at birth differ by sex and race/ethnicity? In the next post in this series, we’ll use PhilaStats to explore this question.

We would love to hear how you’re using PhilaStats. You can reach the Data Lab by email, or find Megan, the Director of the Data Lab, on Twitter.