Getting demographic data

Below, we have provided instructions for downloading the data that will be used to identify underserved communities in Tampa Bay. To view instructions for cleaning the data and utilizing the demographic indices to map underserved communities, see Mapping underserved communities.

Load the required R packages (install first as needed).

library(sf)
library(mapview)
library(dplyr)

To collect demographic data that will be used for identifying underserved communities, we will be downloading U.S. census data provided by the EPA’s 2022 Environmental Justice Screening Tool (EJScreen). This data is available from https://gaftp.epa.gov/EJSCREEN/2022/. Here you will find different versions of EJScreen data that are summarized, calculated, and visualized in different ways to meet your particular needs (e.g., census blocks or tracts, state or national percentiles, tabular or spatial data).

In our case, we are interested in obtaining spatial data for the supplemental demographic indices, summarized at the census tract level, using national percentiles as our thresholds for identifying underserved communities. Census tracts represent aggregated block groups of 1,200-8,000 people. This level is advantageous because it is the highest resolution for which the federal government provides standardized demographic, socioeconomic, and environmental data.

The appropriate file to download for our requirements at the tract level is EJSCREEN_2022_Supplemental_with_AS_CNMI_GU_VI_Tracts.gdb.zip.

Download the relevant file from EJScreen. The file is downloaded to a temporary directory (~260mb).

# url with zip gdb to download
urlin <- 'https://gaftp.epa.gov/EJSCREEN/2022/EJSCREEN_2022_Supplemental_with_AS_CNMI_GU_VI_Tracts.gdb.zip'

# download file
tmp1 <- tempfile(fileext = ".zip")
download.file(url = urlin, destfile = tmp1)

Unzip the geodatabase that was downloaded to a second temporary directory.

# unzip file
tmp2 <- tempdir()
utils::unzip(tmp1, exdir = tmp2)

Read the polygon layer from the geodatabase.

# get the layers from the gdb
gdbpth <- list.files(tmp2, pattern = '\\.gdb$', full.names = T)
gdbpth <- gsub('\\\\', '/', gdbpth)
lyr <- st_layers(gdbpth)$name

# read the layer
dat <- st_read(dsn = gdbpth, lyr)

To exclude census tracts outside of our watershed boundary, intersect the layer with the Tampa Bay watershed (available as an RData object in the source repository for this website here). If working in a different area, you will want to replace the tbshed shapefile with your own boundary file.

load(file = 'data/tbshed.RData')

# intersect the layer with the tb watershed
tb_tract <- dat %>% 
  st_transform(crs = st_crs(tbshed)) %>% 
  st_make_valid() %>% 
  st_intersection(tbshed)

The layer can be saved as an RData object if needed. The size should be minimal (~1mb).

# save the layer as an RData object (~1mb)
save(tb_tract, file = 'data/tb_tract.RData')

View the data using mapview (only the spatial data are shown). You can see that we now have the desired spatial data just for our watershed.

load(file = 'data/tb_tract.RData')

tb_tract %>%
  select(-everything()) %>% 
  mapview(layer.name = "Census tracts")

Unlink the temporary files to delete them when you are finished.

unlink(tmp1, recursive = TRUE)
unlink(gdbpth, recursive = TRUE)