Introducing PhilaStats, the new home of vital statistics for Philadelphia, part two

Jonathan’s note: This is Part Two of a series about the new vital statistics portal PhilaStats, provided by Megan Todd and Annaka Scheeres of the Data Lab at the Philadelphia Department of Public Health‘s Division of Chronic Disease and Injury Prevention. Read Part One here.

Guest post by Megan Todd and Annaka Scheeres from the Data Lab at the Philadelphia Department of Public Health, Division of Chronic Disease and Injury Prevention

View code
### load required libraries
library(tidyverse)
library(magrittr)
library(extrafont)
library(rphl)
library(ggiraph)


### load required data

## connect to base ArcGIS Rest API for citywide mortality metrics
death_url <- httr::parse_url("https://services.arcgis.com/fLeGjb7u4uXqeF9q/ArcGIS/rest/services/Vital_Mortality_Cty/FeatureServer/0/query")


## build query for ArcGIS Rest API
death_url$query <- list(where = "METRIC_NAME = 'life_expectancy' AND AGE_CATEGORY = 'At birth'",
                        # allows > 2,000 records to be loaded
                        resultType = "standard",
                        # returns all fields
                        outFields = "*",
                        returnGeometry = FALSE,
                        # specifies format of data as JSON
                        f = "json")


## load and format data frame of citywide mortality metrics
death_metrics <- jsonlite::fromJSON(httr::build_url(death_url))$features$attributes %>% 
  janitor::clean_names() %>% 
  select(-c(objectid,rank)) %>% 
  mutate(metric_name = "Life expectancy at birth",
         quality_flag = case_when(race_ethnicity %in% c("Asian (NH)","Hispanic") ~ "Some concerns due to small population size",
                                  TRUE ~ "None"),
         year = as.character(year))


### create functions for figures

## create general ggplot themes
theme_cdip_data_lab <- function() {
  theme_bw(base_family = "Open Sans") +
    theme(
      panel.grid.minor = element_blank(),
      panel.border = element_blank(),
      axis.title.x =  element_blank(),
      axis.title.y = element_text(size=16),
      axis.ticks = element_blank(),
      legend.title = element_blank(),
      legend.key = element_rect(colour = "transparent", fill = "white"),
      plot.title = element_text(size=15,face = "bold"),
      plot.caption = element_text(size=13,colour = "#7a8489"),
      axis.text = element_text(size=14),
      legend.text = element_text(size=14),
      strip.background = element_rect(color = "#999690", fill = "#999690"),
      strip.text = element_text(face = "bold", color = "#ffffff", size = 14),
      panel.spacing.y = unit(1.5,"lines")
    )
}


## function to format numbers nicely
comma <- function(x) format(x, big.mark = ",",scientific=FALSE)


## set race/ethnicity colors for all plots
race_eth_colors <- c("All races/ethnicities"="#2176d2",
                     "Asian (NH)"="#96c9ff",
                     "Black (NH)"= "#f99300",
                     "Hispanic"="#58c04d",
                     "White (NH)"="#f3c613")


## set sex colors for all plots
sex_colors <- c("All sexes"="#2176d2",
                "Female"="#f3c613",
                "Male"="#96c9ff")


## function to make life expectancy figures
e_0_figure <- function(race_eth_input,fill_variable,fill_colors) {
  
  base_plot <- death_metrics %>%
    # filter data to life expectancy metric in 2019
    filter(race_ethnicity %in% race_eth_input,
           year == 2019,
           metric_name == "Life expectancy at birth") %>% 
    # set race_ethnicity as x-axis variable
    ggplot(aes(x=race_ethnicity,
               # set metric_value (i.e., life expectancy) as y-axis variable
               y=metric_value,
               # pipe in desired fill variable (sex or race_ethnicity)
               fill = {{fill_variable}},
               # define text in tooltip
               tooltip = str_c("Value: ",comma(round(metric_value,2)),
                               "\nMetric: ",metric_name,
                               "\nYear: ",year,
                               "\nRace/ethnicity: ",race_ethnicity,
                               "\nQuality concerns: ",quality_flag))) +
    # manually set color scheme
    scale_fill_manual(values=fill_colors) +
    # specify column geometry
    geom_col_interactive() +
    # facet by sex
    facet_wrap(~sex) +
    # apply pre-defined ggplot theme
    theme_cdip_data_lab() +
    # set figure and axis titles
    labs(title = "Life expectancy at birth in Philadelphia, PA (2019)",
         caption = "Source: PA Vital Registration System; Notes: NH = non-Hispanic",
         y = "Years")
  
  
  # if race/ethnicity input is longer than 1, set x-axis labels to 45 degree angle
  if (length(race_eth_input) > 1) {
    
    base_plot <- base_plot +
      guides(x = guide_axis(angle = 56))
    
  } else {base_plot}
  
  
  # convert ggplot object to girafe object
  girafe_plot <- girafe(ggobj = base_plot,
                        width_svg = 12,
                        height_svg = 6)


  # make tooltip color match color of corresponding point
  girafe_options(x = girafe_plot,
                 opts_tooltip(use_fill = TRUE,
                              css = "font-family: 'Open Sans';font-size: 14px;color:white;"))
  
}

The Philadelphia Department of Public Health recently launched a new resource for health information on the residents of Philadelphia. PhilaStats is an interactive dashboard that highlights statistics and trends in population, mortality (deaths), and natality (births) for Philadelphia residents over the past decade. The most recent year of data is 2019; we expect the state to release final numbers for 2020 sometime this summer and will update PhilaStats at that time.

In part two of this series, we will explore life expectancy at birth in Philadelphia. Missed part one? Catch up here.

How does life expectancy at birth differ by sex and race/ethnicity?

Life expectancy at birth is typically estimated in a given year as the number of years a person born in that year would live, assuming mortality rates remain constant throughout that person’s life. This assumption is false, of course; a person born in 2022 would not experience the same mortality rates at age 65 as those who were 65 in 2022. Instead, they would experience mortality rates at age 65 in 2087, which we might expect to be lower than the 2022 rates given recent trends in mortality reduction over time.

Because of this assumption, life expectancy is sensitive to mortality shocks. For example, life expectancy plummeted during the 1918 influenza pandemic; however, life expectancy was calculated for a hypothetical person who would live every year of their life in 1918, which very few people actually experienced. The same is true for the declines in life expectancy we expect to see during the COVID-19 pandemic when 2020 data are released: most people (except for infants who die the same year they’re born) will not actually live their entire life subject to these elevated mortality rates.

View code
e0_sex <- e_0_figure(race_eth_input = "All races/ethnicities", 
           fill_variable = sex, 
           fill_colors = sex_colors)

Click here to expand and visit PhilaStats to customize.

Life expectancy in Philadelphia was stable at around 76 years between 2012 and 2019. Mirroring a trend we see pretty much everywhere in the world, life expectancy in 2019 was higher for women (80.1 years) than it was for men (73.0 years). Life expectancy in Philadelphia was lower than in the US as a whole – only by a few months for women, but by more than three years for men.

View code
e0_race <- e_0_figure(race_eth_input = c("All races/ethnicities","Asian (NH)","Black (NH)","Hispanic","White (NH)"), 
           fill_variable = race_ethnicity, 
           fill_colors = race_eth_colors)

Click here to expand and visit PhilaStats to customize.

In addition to differences by sex, there was substantial variation in life expectancy by race/ethnicity as well. Of any sex and race/ethnicity combination, life expectancy in 2019 was lowest for non-Hispanic Black men at just 69.0 years. In comparison, life expectancy for non-Hispanic white men and Hispanic men was 74.8 years and 75.8 years, respectively. There was a similar trend among women in 2019: non-Hispanic Black women had a life expectancy of 77.4 years, compared to 81.0 years for non-Hispanic white women and 85.2 years for Hispanic women.

I haven’t yet mentioned life expectancy among non-Hispanic Asian groups in Philadelphia. You’ll see from the above graph that non-Hispanic Asian men and women have the highest life expectancy of any race/ethnicity group, at 90.7 years for both women and men. Interestingly, there’s no sex difference! This is unusual. Even more striking is the size of the non-Hispanic Asian advantage in life expectancy. Life expectancy for non-Hispanic Asian women is about 10.5 years higher than life expectancy for all women in Philadelphia. Life expectancy for non-Hispanic Asian men is nearly 18 years higher than it is for all men.

What could be accounting for the large non-Hispanic Asian life expectancy advantage we see in Philadelphia’s vital statistics? Is this difference real? There are two possibilities: either vital statistics are accurately capturing longer lives lived by non-Hispanic Asian Philadelphians compared to the rest of the population; or there could be problems in our data that suggest an advantage that doesn’t exist or overestimate the size of the advantage.

We have several reasons to believe that non-Hispanic Asian Philadelphians truly do live longer than the rest of the population. This pattern – albeit at a smaller scale – has been found for the US as a whole. Research has found that Asian American populations have lower mortality rates than other race/ethnicity groups in the US, no matter what specific cause of death is considered. Another stream of research has found that foreign-born Americans live longer than US-born Americans, and Asian-born Americans have exceptionally high longevity.

Are non-Hispanic Asian Philadelphians were more likely that other Philadelphians to be foreign-born? If so, this could contribute to the life expectancy advantage. This is a question we can answer relatively easily using Kyle Walker’s brilliant R package tidycensus.

View code
### Step 1: Set up for accessing data ----------------------------- 

## load required packages for accessing and analyzing Census data 
library(tidycensus) 
library(srvyr) 


## Set Census API key (can be obtained from http://api.census.gov/data/key_signup.html) 
# census_api_key("enter API key here", install = T) 


## pums_variables: searchable data catalog built into tidycensus 
pums_vars <- pums_variables %>%
  filter(year == 2019, survey == "acs5") %>%
  distinct(var_code, var_label, data_type, level) %>%
  filter(level == "person", !str_detect(var_label, "allocation flag"))


## list of Philadelphia pumas 
phila_pumas <- c("03201", "03202", "03203", "03204", "03205", "03206", "03207", "03208","03209", "03210", "03211") 


### Step 2: Importing data from tidycensus::get_pums() --------------------- 
if(file.exists("cached/phila_pums_2019.RDS")){
  phila_pums_2019 <- readRDS("cached/phila_pums_2019.RDS")
} else {
phila_pums_2019 <- get_pums(
  state = "PA", 
  puma = phila_pumas, 
  survey = "acs5", 
  year = 2019,
  recode = TRUE, # note: not available for 2020 yet 
  # select desired variables to include 
  variables = c("RAC1P","HISP","NATIVITY"),
  # load person level weights for weighted survey object 
  rep_weights = "person") %>% 
  # recode NATIVITIY variable into binary native and foreign born variables
  mutate(native_born = NATIVITY == 1, 
         foreign_born = NATIVITY == 2, 
         # create recoded race/ethnicity field based on RAC1P and HISP variables
         race_ethnicity = factor(
           case_when(HISP != "01" ~ "Hispanic",
                     RAC1P == 6 & HISP == "01" ~ "Asian, non-Hispanic",
                     RAC1P == 2 & HISP == "01" ~ "Black, non-Hispanic",
                     RAC1P == 1 & HISP == "01" ~ "White, non-Hispanic",
                     TRUE ~ "Other, non-Hispanic")
           ),
         # rearrange levels of factor so "Other, non-Hispanic" is last
         race_ethnicity = fct_relevel(race_ethnicity,"Other, non-Hispanic",after=4L))
}


### Step 3 - Create weighted survey object ---------------------------- 

## create weighted survey object with person-level weights
w_phila_pums_2019 <- to_survey(phila_pums_2019, type = "person") 
### Step 4 - Generate table output ---------------------------- 

## create formatted output HTML table 
w_phila_pums_2019 %>% 
  # group by race and Hispanic ethnicity variables 
  group_by(race_ethnicity) %>% 
  # select native_born and foreign_born columns to summarise
  summarise(across(ends_with("_born"),
                   # create list of functions to apply to selected columns: survey mean and total
                   # vartype = NULL excludes uncertainty estimates from output 
                   .fns = list(p = ~survey_mean(.x, vartype = NULL),
                               t = ~survey_total(.x, vartype = NULL)))) %>% 
  # add total population summary row
  bind_rows(w_phila_pums_2019 %>% 
              # select native_born and foreign_born columns to summarise
              summarise(across(ends_with("_born"),
                   # create list of functions to apply to selected columns: survey mean and total
                   # vartype = NULL excludes uncertainty estimates from output
                   .fns = list(p = ~survey_mean(.x, vartype = NULL),
                               t = ~survey_total(.x, vartype = NULL)))) %>% 
              mutate(race_ethnicity = "Total Population")) %>% 
  # format %s and totals nicely
  mutate(across(ends_with("_p"),~str_c(round(.x*100),"%")),
         across(ends_with("_t"),~comma(.x))) %>% 
  # convert table to HTML format 
  knitr::kable(format = "html",
               # clean up column names 
               col.names = c("Race/Ethnicity", 
                             "% Native Born",
                             "# Native Born", 
                             "% Foreign Born", 
                             "# Foreign Born"),
               caption = "Nativity by Race/Ethnicity in Philadelphia, PA (2019)") %>% 
  # add footnote with data source to table
  kableExtra::footnote(general = "Source: American Community Survey Public Use Microdata Series") %>% 
  # add nice styling to HTML table 
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                            full_width = T) 
Nativity by Race/Ethnicity in Philadelphia, PA (2019)
Race/Ethnicity % Native Born # Native Born % Foreign Born # Foreign Born
Asian, non-Hispanic 34% 38,492 66% 75,241
Black, non-Hispanic 93% 595,028 7% 47,731
Hispanic 81% 187,232 19% 45,000
White, non-Hispanic 91% 495,663 9% 48,834
Other, non-Hispanic 85% 39,645 15% 6,967
Total Population 86% 1,356,060 14% 223,773
Note:
Source: American Community Survey Public Use Microdata Series

Click here to expand.

The table above shows that 66% of non-Hispanic Asian Philadelphians were born outside of the US, compared to only 14% for the city as a whole. So, non-Hispanic Asian Philadelphians are disproportionately foreign-born, which may explain why life expectancy for this group is so much higher than any other race/ethnicity group.

Okay, so we hope we’ve convinced you that non-Hispanic Asian Philadelphians really do have a life expectancy advantage over other race/ethnicity groups. But the size of the advantage might be overestimated because of inaccuracies in the data.

One documented problem is that race/ethnicity is inconsistency recorded, especially for Asian Americans. Estimates of mortality, which feed into life expectancy calculations, typically use data from two sources: the race/ethnicity recorded on death certificates is used for the numerator (number of deaths) and the race/ethnicity from population estimates like the Census is used for the denominator (number of people). When race/ethnicity is recorded in the Census, it’s almost always reported by the person themselves, or by another member of the household. Race/ethnicity on a death certificate, however, is more likely to be reported by someone who may not know the person’s race/ethnicity, such as a physician or medical examiner.

Studies have compared how the race/ethnicity of specific individuals is recorded in various data sources over time to assess the consistency and accuracy of these data sources. For example, a person’s race/ethnicity may be recorded in one way in a decennial Census (when race/ethnicity is reported either by themselves or a household member), and another way on their death certificate (which may be reported by someone who does not know the person’s race/ethnicity). These studies have found that race/ethnicity is reported fairly accurately on the death certificate – meaning, the same race/ethnicity is identified on the death certificate as was self-selected on an earlier Census form or survey – only for non-Hispanic white and non-Hispanic Black Americans. Hispanic Americans, Asian Americans, and especially American Indians and Alaska Natives (AIAN) were underreported on death certificates. That is, someone who self-identified as Asian was more likely to be misclassified as some other race/ethnicity on their death certificate.

If death certificates underestimate the number of Asian Americans (or Asian Philadelphians) who die, this means the mortality rate for Asian Americans is underestimated, and thus life expectancy for Asian Americans is overestimated. The same is true for Hispanic Americans and AIAN Americans. When you see standard life expectancy estimates based on data from death certificates and the Census for Asian, Hispanic, and AIAN populations in the US, they are likely too high. We don’t know exactly how elevated these estimates are, but some recent research suggests the overestimation is significant in statistical terms.

So our conclusion about non-Hispanic Asian life expectancy in Philadelphia is the following. Non-Hispanic Asian Philadelphians live quite a bit longer than non-Asian Philadelphians. This is likely due to the fact that Non-Hispanic Asian Philadelphians are disproportionately foreign-born compared to other groups. However, these data are not perfect, and we think the true life expectancy advantage enjoyed by non-Hispanic Asian Philadelphians may be slightly smaller than what we see on PhilaStats.

Thanks for exploring PhilaStats with us!

We would love to hear how you’re using PhilaStats. You can reach the Data Lab by email, or find Megan, the Director of the Data Lab, on Twitter.

One Reply to “Introducing PhilaStats, the new home of vital statistics for Philadelphia, part two”

  1. As I understand, life expectancy is based on part on age at death from death certificates compared to population as shown in the census. When comparing foreign born to native born individuals, however, that includes native born people who die at young ages, including birth and infancy, but it excludes the foreign born people who died at those ages and therefore never reached the USA. That seems to me likely to skew the data. Am I wrong?

Comments are closed.