Getting environmental justice data

Under the new Equity Strategy, our goal is to ensure at least 40% of the benefits from TBEP activities are directed to underserved communities (see Mapping underserved communities). Many of these underserved communities are also disproportionately burdened by pollution, expected impacts from climate change, and lack of green space. Understanding which communities face different burdens can help TBEP prioritize different activities to help mitigate or reduce the burdens facing these communities.

TBEP has selected the following 12 factors representing (1) those most relevant to the burdens facing communities in Tampa Bay and (2) environmental injustices that could be reduced by the benefits of TBEP activities:

Below, we have provided instructions for downloading, cleaning, and analyzing the data that will be used to characterize the burdens facing communities in Tampa Bay. To view instructions for utilizing the final data to map underserved and overburdened communities, see Mapping underserved and overburdened communities.

Load the required R packages (install first as needed).

library(sf)
library(mapview)
library(dplyr)
library(RColorBrewer)

CEJST Data

The U.S. Council on Environmental Quality has developed the Climate and Economic Justice Screening Tool (CEJST) to assist users in identifying overburdened communities using a similar methodology as EJScreen, in which national percentiles are used as thresholds for flagging census tracts that are significantly burdened by one or more indicators spanning climate change, energy, health, housing, pollution, and other dimensions. The data is available from https://screeningtool.geoplatform.gov/en/downloads.

CEJST provides the percentiles we need for 10 out of the 12 variables we use for identifying overburdened communities. Below are brief descriptions of the variables and how they are estimated, and more information about the data, sources, and methodologies can be found here.

  • Expected agricultural loss rate: Expected agricultural value at risk from losses due to fourteen types of natural hazards linked to climate change.
  • Projected flood risk: Number of properties at risk of floods occurring in the next 30 years from tides, rain, riverine or storm surges based on a climate-adjusted model.
  • PM2.5 in the air: Weight of fine inhalable particles (< 2.5 micrometers in diameter) per cubic meter.
  • Historic underinvestment: Census tracts that experienced historic underinvestment based on redlining maps created between 1935-1940.
  • Lack of green space: Share of land with developed surfaces covered with artificial materials (e.g. concrete, pavement).
  • Proximity to hazardous waste facilities: Number of hazardous waste facilities within 5 km (or nearest beyond 5 km) divided by distance.
  • Proximity to Superfund sites: Number of proposed or listed Superfund or National Priorities list (NPL) sites within 5 km (or nearest beyond 5 km) divided by distance.
  • Traffic proximity and volume: Number of vehicles (average annual daily traffic) at major roads within 500 m, divided by distance.
  • Underground storage tanks and releases: Density of leaking underground storage tanks divided by all active underground storage tanks within 1,500 ft.
  • Wastewater discharge: Modeled toxic concentrations in stream segments within 500 m, divided by distance.

Download the CEJST zip file to a temporary directory and unzip it to a second temporary directory.

urlin <- 'https://static-data-screeningtool.geoplatform.gov/data-versions/1.0/data/score/downloadable/1.0-shapefile-codebook.zip'

tmp1 <- tempfile(fileext = ".zip")
download.file(url = urlin, destfile = tmp1)

tmp2 <- tempdir()
unzip(tmp1, exdir = tmp2)

Unzip the ‘usa.zip’ file in the folder.

zip1 <- list.files(tmp2, 'usa\\.zip', full.names = T)
unzip(zip1, exdir = tmp2)

Get file path for ‘usa.shp’ and import with sf.

cejst <- list.files(tmp2, '\\.shp', full.names = T)
datcejst <- st_read(cejst)

To exclude census tracts outside of our watershed boundary, intersect the layer with the Tampa Bay watershed. If working in a different area, you will want to replace the tbshed shapefile with your own boundary file. In this case, the coordinate system is the same, so there’s no need to transform.

load(file = 'data/tbshed.RData')

dattbcejst <- datcejst %>% 
  st_intersection(tbshed)

save(dattbcejst, file = 'data/dattbcejst.RData')

View the data. It will look similar to the EJScreen data.

mapview(dattbcejst)

Unfortunately, the census tracts from CEJST are not completely identical to the census tracts in EJScreen. The tracts provided by CEJST, when they do differ, are primarily at a larger scale (e.g., one CEJST tract may be split into 2 tracts in EJScreen data). We prefer to work from the higher resolution tracts (EJScreen), so we need to make sure, for each EJScreen tract, we pull the relevant data from the CEJST tract. Notably, some of the resulting tract data may not be entirely accurate, as percentiles may apply to the larger CEJST tract but not necessarily to the tracts as split by the EJScreen data. As of February 2023, the tract boundaries between EJScreen and CEJST have yet to be harmonized, but this may change in future versions of these tools.

Load the ‘underserved_tract’ shapefile created from “Mapping underserved communities”.

load(file = 'data/underserved_tract.RData')

mapview(underserved_tract, layer.name = "Underserved Communities")

We will create representative points out of the underserved EJScreen tracts. We can then attribute the values of the underlying CEJST tracts to each point, and then merge those values into ‘underserved_tract’. Similar to the thresholds defined for identifying underserved communities, we define “overburdened” communities as those that fall within the 80th percentile (or greater) nationally on at least 1 of the 12 measures of environmental justice outlined above.

cejstvalues <- st_point_on_surface(underserved_tract) %>%
  st_intersection(dattbcejst) %>%
  mutate(thresholdEJ_agloss = ifelse(EALR_PFS >= 0.80, 1, 0),
         thresholdEJ_floodr = ifelse(FLD_PFS >= 0.80, 1, 0),
         thresholdEJ_greens = ifelse(IS_PFS >= 0.80, 1, 0),
         thresholdEJ_pm2.5 = ifelse(PM25F_PFS >= 0.80, 1, 0),
         thresholdEJ_trafic = ifelse(TF_PFS >= 0.80, 1, 0),
         thresholdEJ_wastew = ifelse(WF_PFS >= 0.80, 1, 0),
         thresholdEJ_hwaste = ifelse(TSDF_PFS >= 0.80, 1, 0),
         thresholdEJ_ugtank = ifelse(UST_PFS >= 0.80, 1, 0),
         thresholdEJ_sfunds = ifelse(NPL_PFS >= 0.80, 1, 0),
         thresholdEJ_redlin = ifelse(HRS_ET >= 0.80, 1, 0)) %>%
  rowwise() %>%
  select(matches('^thresholdEJ|^ID')) %>%
  as.data.frame()

underserved_cejst <- left_join(underserved_tract, cejstvalues, by = 'ID')

Mining Data

Although CEJST provides data on abandoned coal mines, phosphate mining is a unique characteristic of Florida, and especially Tampa Bay. Of the 78 phosphate mining operations in the U.S. recorded by the U.S. Geological Survey, 29 (37%) are located in Florida, and 13 (17%) are within the Tampa Bay watershed. The potential impacts of phosphate mining are thus a unique burden to communities in our watershed, which is why we have opted to include phosphate rather than coal mines in our identification of overburdened communities.

Spatial data is available from the Florida Department of Environmental Protection (FDEP) here. This is provided as a polygon layer, showing all active mandatory phosphate mines in Florida as of 2019. See the FDEP website for more details on the dataset.

The shapefile can be read in directly with st_read.

mined <- st_read('https://ca.dep.state.fl.us/arcgis/rest/services/OpenData/MMP_MINEDUNITS/MapServer/0/query?outFields=*&where=1%3D1&f=geojson')

View a map of all active mandatory phosphate mines in Florida.

mapview(mined)

CEJST considers any census tract containing abandoned coal mines to be significantly burdened. We adopt the same approach for phosphate mines in Tampa Bay. Create an indicator for tracts that overlap with the phosphate mining units.

sf_use_s2(FALSE)

maxval <- underserved_cejst %>%
  st_join(mined) %>%
  mutate(SITE_ID = coalesce(SITE_ID, 0)) %>% 
  mutate(dummyvar = ifelse(SITE_ID > 0, 1, 0)) %>%
  group_by(ID) %>%
  summarise(thresholdEJ_phmine = max(dummyvar)) %>%
  as.data.frame()

underserved_cejst_mines <- left_join(underserved_cejst, maxval, by = 'ID')

Brownfield Data

Properties in which expansion, redevelopment, or reuse may be complicated by the potential presence of contaminants, called “brownfields,” are another important burden to some communities in Tampa Bay. While CEJST and EJScreen do not include the proximity of census tracts to brownfield sites in their national percentiles, the EPA hosts a database with the lat/lon coordinates of brownfield properties, called the Assessment, Cleanup, and Redevelopment Exchange System (ACRES). The data is available as KML point data here.

Download the zipped KML file to a temporary directory.

# url with zipped kml
urlin <- 'https://ordsext.epa.gov/FLA/www3/acres_frs.kmz'

# download file
tmp1 <- tempfile(fileext = ".kmz")
download.file(url = urlin, destfile = tmp1, method = 'curl')

Unzip the KMZ file.

tmp2 <- tempdir()
unzip(tmp1, exdir = tmp2)

Get the name of the KML file to read.

lyr <- unzip(tmp1, list = T)$Name
fl <- paste(c(tmp2, lyr), collapse = "\\")
fl <- gsub('\\\\', '/', fl)

Read the KML file with st_read and drop the Z dimension with st_zm. If you would like to only view brownfield sites in a particular area, below is an example for loading just the sites in the Tampa layer. You can view all possible locations in the kml file with st_layers. However, we need to create our brownfield metric using the nation-wide data, so you can skip this step.

dat <- st_read(fl, layer = 'TAMPA') %>% 
  st_zm()

To import all layers in the kml file, identify the layer names and loop through them to add to a single object. The data are saved as an .Rdata object and .csv file for later use. The data include only the site name and location in decimal degrees.

NOTE: This code takes several hours to complete. We have provided the code so that you may replicate this approach if desired, but for the sake of time, we recommend loading the Rdata object we have saved already (next step).

# layer names
alllyr <- st_layers(fl)$name

strt <- Sys.time()
out <- NULL
for(i in alllyr){
  
  # counter
  cat(i, which(i == alllyr), 'of', length(alllyr), '\n')
  print(Sys.time() - strt)
  
  # import each layer
  dat <- st_read(fl, i, quiet = T)[, c('Name')]
  
  # append to same object
  out <- rbind(out, dat)
  
}

# save as RData object
allbfld <- out %>% st_zm()
save(allbfld, file = 'data/allbfld.RData', compress = 'xz')

# save as csv
allbfldcsv <- allbfld %>% 
  mutate(
    lon = st_coordinates(.)[,1], 
    lat = st_coordinates(.)[,2]
  ) %>% 
  st_set_geometry(NULL)
write.csv(allbfldcsv, 'data/allbfldcsv.csv', row.names = F)

Load and view the brownfield sites.

load(file = 'data/allbfld.RData')
mapview(allbfld, legend = F, col.regions = 'brown')

You can see the locations of all brownfield sites from the ACRES database across the country. We will use this data in a manner consistent with how CEJST measures a community’s proximity to hazardous waste facilities and Superfund sites.

Load the national census tract data from EJScreen if you have it saved from Getting demographic data. If not, download the data again using the same steps as before.

# url with zip gdb to download
urlin <- 'https://gaftp.epa.gov/EJSCREEN/2022/EJSCREEN_2022_Supplemental_with_AS_CNMI_GU_VI_Tracts.gdb.zip'

# download file
tmp1 <- tempfile(fileext = ".zip")
download.file(url = urlin, destfile = tmp1)

# unzip file
tmp2 <- tempdir()
utils::unzip(tmp1, exdir = tmp2)

# get the layers from the gdb
gdbpth <- list.files(tmp2, pattern = '\\.gdb$', full.names = T)
gdbpth <- gsub('\\\\', '/', gdbpth)
lyr <- st_layers(gdbpth)$name

# read the layer, keep only the tract ID to reduce size
us_tract <- st_read(dsn = gdbpth, lyr) %>%
  rowwise() %>%
  select(matches('^ID'))

First, we need to transform the brownfield and national EJScreen census tract data to an appropriate projected coordinate system so we can reliably calculate distances. For the U.S. (including Alaska and Hawaii), we will use the North America Albers Equal Area Conic (EPSG: 9822) projection.

bfields <- st_transform(allbfld, crs = 9822)
tracts <- st_transform(us_tract, crs = 9822)

In line with the methodology for calculating proximity to hazardous waste facilities and Superfund sites, we will calculate each census tract’s proximity to brownfield sites as (1) the number of brownfield sites within 5 km of the tract divided by 5 km, or (2) if there are no sites within 5 km, then 1 divided by the distance to the nearest site. We will use Euclidean distances. Note that the units for distances will be in meters.

# 5 km buffer (this will take several minutes)
tractsbuff <- st_buffer(tracts, dist = 5000, endCapStyle = "ROUND")

# number of brownfield sites within 5 km
tractsbuff$brownfields_N <- lengths(st_intersects(tractsbuff, bfields))

# distance to nearest brownfield site
nearest <- st_nearest_feature(tracts, bfields)
distance <- st_distance(tracts, bfields[nearest,], by_element=TRUE)
joined <- cbind(tracts, st_drop_geometry(bfields)[nearest,]) %>%
  mutate(dist_m = distance) %>%
  as.data.frame()

proxbfield <- left_join(tractsbuff, joined, by = 'ID') %>%
  as.data.frame() %>%
  mutate(proximity = ifelse(brownfields_N > 0, brownfields_N/5000, 1/dist_m),
         percentile = ntile(proximity, 100),
         thresholdEJ_bfield = ifelse(percentile >= 80, 1, 0))

Merge with previous metrics.

underserved_cejst_mines_bfields <- left_join(underserved_cejst_mines, proxbfield, by = 'ID') %>%
  mutate(thresholdEJ_N = sum(thresholdEJ_agloss,thresholdEJ_floodr,thresholdEJ_greens,thresholdEJ_pm2.5,thresholdEJ_trafic,thresholdEJ_wastew,thresholdEJ_hwaste,thresholdEJ_ugtank,thresholdEJ_sfunds,thresholdEJ_redlin,thresholdEJ_phmine,thresholdEJ_bfield, na.rm = TRUE)) %>%
  mutate(overburdened = ifelse(thresholdEJ_N > 0, "Yes", "No")) %>%
  rowwise() %>%
  select(matches('^threshold|^ID|^under|^over'))

under_over <- underserved_cejst_mines_bfields

Save this final layer as an RData object for future use.

# save the layer as an RData object
save(under_over, file = 'data/under_over.RData')

Unlink the temporary files to delete them when you are finished.

unlink(tmp1, recursive = TRUE)
unlink(gdbpth, recursive = TRUE)
unlink(fl, recursive = TRUE)