5  Spatial data analysis

Get the lesson R script: spatial_data_analysis.R

Get the lesson data: download zip

5.1 Lesson Outline

5.2 Lesson Exercises

R has been around for over thirty years and most of its use has focused on statistical analysis. Tools for spatial analysis have also been developed that allow the use of R as a full-blown GIS with capabilities similar or even superior to commerical software. This lesson will focus on geospatial analysis using the simple features package. We will focus entirely on working with vector data in this lesson, but checkout the terra package if you want to work with raster data in R.

The goals for today are:

  1. Understand the vector data structure

  2. Understand how to import and structure vector data in R

  3. Understand how R stores spatial data using the simple features package

  4. Execute basic geospatial functions in R

5.3 Vector data

Most of us are probably familiar with the basic types of spatial data and their components. We’re going to focus entirely on vector data for this lesson because these data are easily conceptualized as features or discrete objects with spatial information. We’ll discuss some of the details about this later. Raster data, by contrast, are stored in a grid with cells associated with values. Raster data are more common for data with continuous coverage, such as climate or weather layers.

Vector data come in three flavors. The simplest is a point, which is a 0-dimensional feature that can be used to represent a specific location on the earth, such as a single tree or an entire city. Linear, 1-dimensional features can be represented with points (or vertices) that are connected by a path to form a line and when many points are connected these form a polyline. Finally, when a polyline’s path returns to its origin to represent an enclosed 2-dimensional space, such as a forest, watershed boundary, or lake, this forms a polygon.

Image source

All vector data are represented similarly, whether they’re points, lines or polygons. Points are defined by a single coordinate location, whereas a line or polygon include several points with a grouping variable that distinguishes one object from another. In all cases, the aggregate dataset is composed of one or more features of the same type (points, lines, or polygons).

There are two other pieces of information that are included with vector data. The attributes that can be associated with each feature and the coordinate reference system or CRS. The attributes can be any supporting information about a feature, such as a text description or summary data about the features. You can identify attributes as anything in a spatial dataset that is not explicitly used to define the location (or geometry) of the features.

The CRS is used to establish a frame of reference for the locations in your spatial data. The chosen CRS ensures that all features are correctly referenced relative to each other, especially between different datasets. As a simple example, imagine comparing length measurements for two objects where one was measured in centimeters and another in inches. If you didn’t know the unit of measurement, you could not compare relative lengths. The CRS is similar in that it establishes a common frame of reference, but for spatial data. An added complication with spatial data is that location can be represented in both 2-dimensional or 3-dimensional space. This is beyond the scope of this lesson, but for any geospatial analysis you should be sure that:

  1. the CRS is the same when comparing datasets, and

  2. the CRS is appropriate for the region you’re looking at.

Image source

To summarize, vector data include the following:

  1. spatial data (e.g., latitude, longitude) as points, lines, or polygons

  2. attributes

  3. a coordinate reference system

These are all the pieces of information you need to recognize in your data when working with features in R.

5.4 Simple features

R has a long history of packages for working with spatial data. For many years, the sp package was the standard and most widely used toolset for working with spatial data in R. This package laid the foundation for creating spatial data classes and methods in R, but unfortunately its development predated a lot of the newer tools that are built around the tidyverse. This makes it incredibly difficult to incorporate sp data objects with these newer data analysis workflows.

The simple features or sf package was developed to streamline the use of spatial data in R and to align its functionality with those provided in the tidyverse. The sf package has replaced sp as the fundamental spatial model in R for vector data. A major advantage of sf, as you’ll see, is its intuitive data structure that retains many familiar components of the data.frame (or more accurately, tibble).

The sf package provides a hierarchical data model that represents a wide range of geometry types - it includes all common vector geometry types and even allows geometry collections, which can have multiple geometry types in a single object. From the first sf package vignette we see:

You’ll notice that these are the same features we described above, with the addition of “multi” features and geometry collections that include more than one type of feature.

5.5 Exercise 12

Let’s get setup for this lesson. We’ll make sure we have the necessary packages installed and loaded. Then we’ll import our datasets.

  1. Open a new script in your RStudio project or within RStudio cloud.

  2. At the top of the script, load the tidyverse, sf, and mapview libraries. Don’t forget you can use install.packages(c('tidyverse', 'sf', 'mapview')) if the packages aren’t installed.

  3. Load the fishdat.csv, statloc.csv, and sgdat.shp datasets from your data folder. For the csv files, use read_csv() and for the shapefile, use the st_read() function from the sf package. The shapefile is seagrass polygon data for Old Tampa Bay in 2016. As before, assign each loaded dataset to an object in your workspace.

Click to show/hide solution
# load libraries
library(tidyverse)
library(sf)
library(mapview)

# load the fish data
fishdat <- read_csv('data/fishdat.csv')

# load the station data
statloc <- read_csv('data/statloc.csv')

# load the sgdat shapefile
sgdat <- st_read('data/sgdat.shp')

5.6 Creating spatial data with simple features

Now that we’re setup, let’s talk about how the sf package can be used. After the package is loaded, you can check out all of the methods that are available for sf data objects. Many of these names will look familiar if you’re familiar with geospatial analysis methods. We’ll use some of these later.

methods(class = 'sf')
  [1] [                            [[<-                        
  [3] [<-                          $<-                         
  [5] aggregate                    anti_join                   
  [7] arrange                      as.data.frame               
  [9] cbind                        coerce                      
 [11] dbDataType                   dbWriteTable                
 [13] distinct                     dplyr_reconstruct           
 [15] drop_na                      duplicated                  
 [17] filter                       full_join                   
 [19] gather                       group_by                    
 [21] group_split                  identify                    
 [23] initialize                   inner_join                  
 [25] left_join                    mapView                     
 [27] merge                        mutate                      
 [29] nest                         pivot_longer                
 [31] pivot_wider                  plot                        
 [33] points                       print                       
 [35] rbind                        rename_with                 
 [37] rename                       right_join                  
 [39] rowwise                      sample_frac                 
 [41] sample_n                     select                      
 [43] semi_join                    separate_rows               
 [45] separate                     show                        
 [47] slice                        slotsFromS3                 
 [49] spread                       st_agr                      
 [51] st_agr<-                     st_area                     
 [53] st_as_s2                     st_as_sf                    
 [55] st_as_sfc                    st_bbox                     
 [57] st_boundary                  st_break_antimeridian       
 [59] st_buffer                    st_cast                     
 [61] st_centroid                  st_collection_extract       
 [63] st_concave_hull              st_convex_hull              
 [65] st_coordinates               st_crop                     
 [67] st_crs                       st_crs<-                    
 [69] st_difference                st_drop_geometry            
 [71] st_exterior_ring             st_filter                   
 [73] st_geometry                  st_geometry<-               
 [75] st_inscribed_circle          st_interpolate_aw           
 [77] st_intersection              st_intersects               
 [79] st_is_full                   st_is_valid                 
 [81] st_is                        st_join                     
 [83] st_line_merge                st_m_range                  
 [85] st_make_valid                st_minimum_bounding_circle  
 [87] st_minimum_rotated_rectangle st_nearest_points           
 [89] st_node                      st_normalize                
 [91] st_point_on_surface          st_polygonize               
 [93] st_precision                 st_reverse                  
 [95] st_sample                    st_segmentize               
 [97] st_set_precision             st_shift_longitude          
 [99] st_simplify                  st_snap                     
[101] st_sym_difference            st_transform                
[103] st_triangulate_constrained   st_triangulate              
[105] st_union                     st_voronoi                  
[107] st_wrap_dateline             st_write                    
[109] st_z_range                   st_zm                       
[111] summarise                    text                        
[113] transform                    transmute                   
[115] ungroup                      unite                       
[117] unnest                      
see '?methods' for accessing help and source code

All of the functions and methods in sf are prefixed with st_, which stands for ‘spatial and temporal’. This is kind of confusing but this is in reference to standard methods available in PostGIS, an open-source backend that is used by many geospatial platforms. An advantage of this prefixing is all commands are easy to find with command-line completion in sf, in addition to having naming continuity with the core, prior software.

There are two ways to create a spatial data object in R, i.e., an sf object, using the sf package.

  1. Directly import a shapefile

  2. Convert an existing R object with latitude/longitude data that represent point features

We’ve already imported a shapefile in Exercise 12, so let’s look at its structure to better understand the sf object. The st_read() function can be used for import. Setting quiet = T will keep R from being chatty when it imports the data.

sgdat <- st_read('data/sgdat.shp', quiet = T)
sgdat
Simple feature collection with 575 features and 2 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -82.72462 ymin: 27.81386 xmax: -82.48426 ymax: 28.03548
Geodetic CRS:  WGS 84
First 10 features:
   OBJECTID FLUCCS                       geometry
1       211   9113 POLYGON ((-82.64982 28.0200...
2       396   9116 POLYGON ((-82.60445 27.9828...
3       398   9113 POLYGON ((-82.58985 27.8230...
4       399   9113 POLYGON ((-82.62025 27.9946...
5       416   9116 POLYGON ((-82.62363 28.0001...
6       451   9113 POLYGON ((-82.58943 27.8232...
7       491   9113 POLYGON ((-82.60703 27.8742...
8       499   9113 POLYGON ((-82.54753 27.9595...
9       515   9116 POLYGON ((-82.53872 27.8979...
10      539   9113 POLYGON ((-82.54837 27.9617...

What does this show us? Let’s break it down.

  • In green, metadata describing components of the sf object
  • In yellow, a simple feature: a single record, or data.frame row, consisting of attributes and geometry
  • In blue, a single simple feature geometry (an object of class sfg)
  • In red, a simple feature list-column (an object of class sfc, which is a column in the data.frame)

We’ve just imported a polygon dataset with 575 features and 2 fields. The dataset uses the WGS 84 geographic CRS (i.e., latitude/longitude). You’ll notice that the actual dataset looks very similar to a regular data.frame, with some interesting additions. The header includes some metadata about the sf object and the geometry column includes the actual spatial information for each feature. Conceptually, you can treat the sf object like you would a data.frame.

Easy enough, but what if we have point data that’s not a shapefile? You can create an sf object from any existing data.frame so long as the data include coordinate information (e.g., columns for longitude and latitude) and you are 100% about the CRS. We can do this with our statloc csv file that we imported.

str(statloc)
spc_tbl_ [2,173 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Reference: chr [1:2173] "TBM1996032006" "TBM1996032004" "TBM1996032207" "TBM1996042601" ...
 $ Latitude : num [1:2173] 27.9 27.9 27.9 28 27.9 ...
 $ Longitude: num [1:2173] -82.6 -82.6 -82.5 -82.7 -82.6 ...
 - attr(*, "spec")=
  .. cols(
  ..   Reference = col_character(),
  ..   Latitude = col_double(),
  ..   Longitude = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

The st_as_sf() function can be used to make this data.frame into a sf object. We must identify which columns contain the coordinates and provide the CRS information, which is WGS84. You can use the EPSG code 4326 to indicate WGS84.

sfstatloc <- st_as_sf(statloc, coords = c('Longitude', 'Latitude'), crs = 4326)
sfstatloc
Simple feature collection with 2173 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -82.72437 ymin: 27.82277 xmax: -82.53105 ymax: 28.03118
Geodetic CRS:  WGS 84
# A tibble: 2,173 × 2
   Reference                 geometry
 * <chr>                  <POINT [°]>
 1 TBM1996032006  (-82.62017 27.9105)
 2 TBM1996032004  (-82.5545 27.85533)
 3 TBM1996032207 (-82.53317 27.92667)
 4 TBM1996042601  (-82.6925 27.98267)
 5 TBM1996051312   (-82.59267 27.861)
 6 TBM1996051407 (-82.60517 27.91533)
 7 TBM1996051415 (-82.67617 27.93633)
 8 TBM1996032209 (-82.64083 27.97017)
 9 TBM1996041807 (-82.54433 27.86317)
10 TBM1996042602 (-82.71983 27.95567)
# ℹ 2,163 more rows

A big part of working with spatial data is keeping track of coordinate reference systems between different datasets. Remember that meaningful comparisons between datasets are only possible if the CRS is the same.

There are many, many types of reference systems and plenty of resources online that provide detailed explanations of the what and why behind the CRS (see spatialreference.org or this guide from NCEAS). For now, just realize that we can use a simple text string in R to indicate which CRS we want.

5.7 Exercise 13

Our fishdat dataset can be made a simple features spatial object by joining it with the corresponding station data. Let’s join the fishdat and statloc datasets and create an sf object using st_as_sf() function.

  1. Join fishdat to statloc using the left_join() function with by = "Reference" as the key.

  2. Use st_as_sf() to make the joined dataset an sf object. There are two arguments you need to specify with st_as_sf(): coords = c('Longitude', 'Latitude') so R knows which columns in your dataset are the x/y coordinates and crs = 4326 to specifiy the CRS as WGS 84.

  3. When you’re done, inspect the dataset. How many features are there? What type of spatial object is this?

Click to show/hide solution
# Join the data
alldat <- left_join(fishdat, statloc, by = 'Reference')

# create spatial data object
alldat <- st_as_sf(alldat, coords = c('Longitude', 'Latitude'), crs = 4326)

# examine the sf object
alldat
str(alldat)

There’s a shortcut for specifying the CRS if you don’t know which one to use. Remember, for spatial anlaysis make sure to only work with datasets that have the same coordinate reference systems. The st_crs() function tells us the CRS for an existing sf obj.

# check crs
st_crs(alldat)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

When we created the alldat dataset, we could have used the CRS from the sgdat object that we created in Exercise 12 by using crs = st_crs(sgdat) for the crs argument. This is possible because we have prior knowledge that all of the data use the WGS84 CRS. This is often a quick shortcut for creating an sf object without having to look up the CRS number.

We can verify that both have the same CRS.

# verify the polygon and point data have the same crs
st_crs(sgdat)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
st_crs(alldat)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

Finally, you may want to use another coordinate system, such as a projection that is regionally-specific. You can use the st_transform() function to quickly change and/or reproject an sf object. For example, if we want to convert a geographic to UTM projection:

alldatutm <- alldat |> 
  st_transform(crs = '+proj=utm +zone=17 +datum=NAD83 +units=m +no_defs')
st_crs(alldatutm)
Coordinate Reference System:
  User input: +proj=utm +zone=17 +datum=NAD83 +units=m +no_defs 
  wkt:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6269]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["UTM zone 17N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-81,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]],
        ID["EPSG",16017]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

Above, we’ve used the “Proj4string” format of the CRS, instead of the EPSG code. This is another perfectly acceptable way to specify a CRS. Also note that transformations can only be done after the original data are correctly imported using the native CRS of the dataset.

5.8 Basic geospatial analysis

Let’s perform some simple geospatial analysis comparing the fisheries data with the seagrass data. As with any analysis, let’s take a look at the data to see what we’re dealing with before we start comparing the two.

plot(alldat)

plot(sgdat)

We have lots of stations where fish have been caught and lots of polygons where seagrass has been documented. You’ll also notice that the default plotting method for sf objects is to create one plot per attribute feature. This is intended behavior but sometimes is not that useful (it can also break R if you have many attributes). Maybe we just want to see where the data are located independent of any of the attributes. We can accomplish this by plotting only the geometry of the sf object.

plot(alldat$geometry)

plot(sgdat$geometry)

To emphasize the point that the sf package plays nice with the tidyverse, let’s do a simple filter on the fisheries data to look at only the 2016 data. This is the same year as the seagrass data.

filt_dat <- alldat |> 
  filter(yr == 2016)
plot(filt_dat$geometry)

Now let’s use the fisheries data and seagrass polygons to do a quick geospatial analysis. Our simple question is:

How many fish were caught where seagrass was observed in 2016?

The first task is to subset the 2016 fisheries data by locations where seagrass was observed. There are a few ways we can do this. The first is to make a simple subset where we filter the station locations using a spatial overlay with the seagrass polygons. With this approach we can see which stations were located over seagrass, but we don’t know which polygons they’re located in for the seagrass data (i.e., this is not a true spatial intersection).

fish_crop <- filt_dat[sgdat, ]
plot(fish_crop$geometry)

fish_crop
Simple feature collection with 145 features and 12 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -82.7182 ymin: 27.82623 xmax: -82.53237 ymax: 28.02418
Geodetic CRS:  WGS 84
# A tibble: 145 × 13
   OBJECTID Reference     Sampling_Date    yr  Gear ExDate              Bluefish
      <dbl> <chr>         <date>        <dbl> <dbl> <dttm>                 <dbl>
 1  2739992 TBM2016040802 2016-04-08     2016    20 2018-04-12 11:06:24        0
 2  2740055 TBM2016050703 2016-05-02     2016   160 2018-04-12 11:06:24        0
 3  2740233 TBM2016062003 2016-06-20     2016    20 2018-04-12 11:06:40        0
 4  2740253 TBM2016062214 2016-06-22     2016    20 2018-04-12 11:06:40        0
 5  2741642 TBM2016021207 2016-02-11     2016    20 2018-04-12 11:06:13        0
 6  2741698 TBM2016031206 2016-03-09     2016    20 2018-04-12 11:06:13        0
 7  2741782 TBM2016021204 2016-02-11     2016    20 2018-04-12 11:06:03        0
 8  2741834 TBM2016031202 2016-03-09     2016    20 2018-04-12 11:06:04        0
 9  2741867 TBM2016040401 2016-04-26     2016   160 2018-04-12 11:06:16        0
10  2742220 TBM2016052108 2016-05-17     2016    20 2018-04-12 11:06:20        0
# ℹ 135 more rows
# ℹ 6 more variables: `Common Snook` <dbl>, Mullets <dbl>, Pinfish <dbl>,
#   `Red Drum` <dbl>, `Sand Seatrout` <dbl>, geometry <POINT [°]>

The second and more complete approach is to intersect the two data objects to subset and combine the attributes. We can use st_intersection() to both overlay and combine the attribute fields from the two data objects.

fish_int <- st_intersection(filt_dat, sgdat)
plot(fish_int$geometry)

fish_int
Simple feature collection with 145 features and 14 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -82.7182 ymin: 27.82623 xmax: -82.53237 ymax: 28.02418
Geodetic CRS:  WGS 84
# A tibble: 145 × 15
   OBJECTID Reference     Sampling_Date    yr  Gear ExDate              Bluefish
 *    <dbl> <chr>         <date>        <dbl> <dbl> <dttm>                 <dbl>
 1  2743920 TBM2016040806 2016-04-08     2016    20 2018-04-12 11:06:16        0
 2  2814418 TBM2016110509 2016-11-09     2016   160 2018-04-12 11:06:55        0
 3  2746066 TBM2016011205 2016-01-14     2016    20 2018-04-12 11:05:59        0
 4  2747535 TBM2016040205 2016-04-05     2016    20 2018-04-12 11:06:09        0
 5  2753329 TBM2016090504 2016-09-26     2016   160 2018-04-12 11:06:52        0
 6  2755935 TBM2016091405 2016-09-26     2016    20 2018-04-12 11:06:51        0
 7  2757993 TBM2016121203 2016-12-05     2016   160 2018-04-12 11:07:20        0
 8  2759789 TBM2016110507 2016-11-09     2016    20 2018-04-12 11:07:11        0
 9  2762371 TBM2016110507 2016-11-09     2016    20 2018-04-12 11:06:53        0
10  2762536 TBM2016121203 2016-12-05     2016   160 2018-04-12 11:07:23        0
# ℹ 135 more rows
# ℹ 8 more variables: Common.Snook <dbl>, Mullets <dbl>, Pinfish <dbl>,
#   Red.Drum <dbl>, Sand.Seatrout <dbl>, OBJECTID.1 <int>, FLUCCS <chr>,
#   geometry <POINT [°]>

Now we can easily see which stations are in which seagrass polygon. We can use some familiar tools from dplyr to get the aggregate catch data for the different polygon counts. Specifically, the FLUCCS attribute identifies seagrass polygons as continuous (9116) or patchy (9113) (metadata here). Maybe we want to summarise species counts by seagrass polygon type. Here, we need to specify the FLUCCS grouping variable outside of the summarize function.

fish_cnt <- fish_int |>
  group_by(FLUCCS) |> 
  summarise(
    Count = sum(Pinfish)
  ) 
fish_cnt
Simple feature collection with 2 features and 2 fields
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: -82.7182 ymin: 27.82623 xmax: -82.53237 ymax: 28.02418
Geodetic CRS:  WGS 84
# A tibble: 2 × 3
  FLUCCS Count                                                          geometry
  <chr>  <dbl>                                                  <MULTIPOINT [°]>
1 9113    1559 ((-82.53352 27.92907), (-82.53617 27.91043), (-82.53535 27.91212…
2 9116    4766 ((-82.55893 27.96612), (-82.5588 27.96633), (-82.55797 27.96632)…

Notice that we’ve retained the sf data structure in the aggregated dataset but the structure is now slightly different, both in the attributes and the geometry column. We now have only two attributes, FLUCCS and Count. The geometry column is retained but now is aggregated to a multipoint object where all points in continous or patchy seagrass are grouped by row. This is a really powerful feature of sf: spatial attributes are retained during the wrangling process.

We can also visualize this information with ggplot2. This plot shows how many pinfish were caught in the two seagrass coverage categories for all of Old Tampa Bay (9113: patchy, 9116: continuous).

ggplot(fish_cnt, aes(x = FLUCCS, y = Count)) + 
  geom_bar(stat = 'identity')

This is not a groundbreaking plot, but we can clearly see that pinfish are more often found in dense seagrass beds (9116).

5.9 Quick mapping

Cartography or map-making is also very doable in R. Like most applications, it takes very little time to create something simple, but much more time to create a finished product. We’ll focus on the simple process using ggplot2 and the mapview package just to get you started. Both packages work “out-of-the-box” with sf data objects.

For ggplot2, all we need is to use the geom_sf() geom.

# use ggplot with sf objects
ggplot() + 
  geom_sf(data = sgdat, aes(fill = FLUCCS)) + 
  geom_sf(data = fish_int) 

The mapview packages lets us create interactive maps to zoom and select data. Note that we can also combine separate mapview objects with the + operator.

mapview(sgdat, zcol = 'FLUCCS') +
  mapview(fish_int)

There’s a lot more we can do with mapview but the point is that these maps are incredibly easy to make with sf objects and they offer a lot more functionality than static plots.

5.10 Exercise 14

We’ll spend the remaining time getting more comfortable with basic geospatial analysis and mapping in R.

  1. Start by filtering the alldat object you created in Exercise 13 to retain only 2016 data. Intersect the filtered data with the sgdat object using st_intersection().

  2. Create a map of this intersected object using mapview(). Use the zcol argument to map the color values to different attributes in your dataset, e.g., Pinfish or Bluefish.

  3. Try combining the map you made in the last step with one for seagrass (hint: mapview() + mapview()).

Click to show/hide solution
# filter and intersect data
tomap <- alldat |> 
  filter(yr == 2016) |> 
  st_intersection(sgdat)

# make map
mapview(tomap, zcol = 'Pinfish')
mapview(tomap, zcol = 'Bluefish')

# join maps
mapview(sgdat, zcol = 'FLUCCS') + mapview(tomap, zcol = 'Pinfish')

5.11 Next steps

Now you should be able to:

  1. Understand the vector data structure

  2. Understand how to import and structure vector data in R

  3. Understand how R stores spatial data using the simple features package

  4. Execute basic geospatial functions in R

This concludes our training workshop. I hope you’ve enjoyed the material and found the content useful. Please continue to use this website as a resource for developing your R skills and checkout our Data and Resources page for additional learning material.