Appendix B — Introduction to R

B.1 RStudio

RStudio is the go-to Interactive Development Environment (IDE) for R. Rstudio includes many features to improve the user’s experience.

Let’s get familiar with RStudio.

B.1.1 Open R and RStudio

Find the RStudio shortcut on your computer and fire it up. You should see something like this:

There are four panes in RStudio:

  • Source: Your primary window for writing code to send to the console, this is where you write and save R “scripts”
  • Console: This is where code is executed in R
  • Environment, History, etc.: A tabbed window showing your working environment, code execution history, and other useful things
  • Files, plots, etc.: A tabbed window showing a file explorer, a plot window, list of installed packages, help files, and viewer

B.1.2 Scripting

In most cases, you will not enter and execute code directly in the console. Code can be written in a script and then sent directly to the console.

Open a new script from the File menu…

B.1.3 Executing code in RStudio

After you write code in an R script, it can be sent to the Console to run the code. There are two ways to do this. First, you can hit the Run button at the top right of the scripting window. Second, you can use ctrl+enter (cmd+enter on a Mac). Either option will run the line(s) of script that are selected.

B.2 R language fundamentals

R is built around functions. The basic syntax of a function follows the form: function_name(arg1, arg2, ...).

With the base install, you will gain access to many functions (2344, to be exact). Some examples:

# print
print("hello world!")
[1] "hello world!"
# sequence
seq(1, 10)
 [1]  1  2  3  4  5  6  7  8  9 10
# random numbers
rnorm(100, mean = 10, sd = 2)
  [1] 10.520563  7.265621  9.452974 10.485918 11.723184 11.342592  9.911046
  [8] 11.200871  9.304915  8.652179 10.115180  6.385661 11.514066 12.809505
 [15] 11.555426 11.464561  8.585822 10.892121 11.489303 10.497141 12.603356
 [22]  6.281753 12.154512  8.615354  9.791544 12.565492  9.760152  8.669989
 [29]  9.987318 10.631343 11.454385 10.950248  4.530696 11.604176 11.756297
 [36]  8.126701  9.329504  9.583132  9.426886  8.380884 12.572159 10.153491
 [43]  9.294735 11.788365 11.558692  9.245436 12.540502 13.352596  8.756149
 [50] 10.860714  9.344257  8.524431 12.983690 10.790139 12.245827 10.436265
 [57]  9.054471  9.342475  7.793375  9.991126  8.528948  8.170237 10.769353
 [64]  9.935336  9.866783 13.817615  6.117164  6.688069 11.649470 12.429177
 [71]  9.809361 10.810414 12.740699  9.042799 10.599858  8.800318  7.595015
 [78]  9.978435  6.712340  9.892352  9.182129  9.883731  8.651914  7.018634
 [85] 12.206669 12.051156 11.211947 11.009311  8.839525  7.470819  9.242782
 [92]  7.777310 13.494383 10.911545  8.824587  8.614696  8.597249  9.061000
 [99]  9.560631  7.443237
# average 
mean(rnorm(100))
[1] -0.2308983
# sum
sum(rnorm(100))
[1] 9.606905

Very often you will see functions used like this:

my_random_sum <- sum(rnorm(100))

The first part of the line is the name of an object that you make up. The second bit, <-, is the assignment operator. This tells R to take the result of sum(rnorm(100)) and store it in an object named, my_random_sum. It is stored in the environment and can be used by just executing it’s name in the console.

my_random_sum
[1] -4.741143

B.2.1 What is the environment?

There are two outcomes when you run code. First, the code will simply print output directly in the console. Second, there is no output because you have stored it as a variable using <-. Output that is stored is saved in the environment. The environment is the collection of named objects that are stored in memory for your current R session.

B.3 Packages

The base installation of R is quite powerful. Packages allow you to include new methods for use in R.

B.3.1 CRAN

Many packages are available on CRAN, The Comprehensive R Archive Network. This is where you download R and also where most will gain access to packages. As of 2023-05-10, there are 19473 packages on CRAN!

B.3.2 Installing packages

When a package gets installed, that means the source code is downloaded and put into your library. A default library location is set for you.

We use the install.packages() function to download and install a package. Here, we install the readxl package, used below, which is used to upload data from and Excel file.

install.packages("readxl")

You should see some text in the R console showing progress of the installation and a prompt after installation is done.

After installation, you can load a package using the library() function. This makes all functions in a package available for you to use.

library(readxl)

An important aspect of packages is that you only need to download them once, but every time you start RStudio you need to load them with the library() function.

B.4 Data structures in R

Now we can talk about R data structures. Simply put, a data structure is a way for programming languages to handle information storage.

B.4.1 Vectors (one-dimensional data)

The basic data format in R is a vector - a one-dimensional grouping of elements that have the same type. These are all vectors and they are created with the c (concatenate) function:

dbl_var <- c(1, 2.5, 4.5)
int_var <- c(1L, 6L, 10L)
log_var <- c(TRUE, FALSE, T, F)
chr_var <- c("a", "b", "c")

The four types of vectors are double (or numeric), integer, logical, and character. The following functions can return useful information about the vectors:

class(dbl_var)
[1] "numeric"
length(log_var)
[1] 4

B.4.2 Data frames (two-dimensional data)

A collection of vectors represented as one data object are often described as two-dimensional data, like a spreadsheet, or in R speak, a data frame. Here’s a simple example:

ltrs <- c("a", "b", "c")
nums <- c(1, 2, 3)
logs <- c(T, F, T)
mydf <- data.frame(ltrs, nums, logs)
mydf
  ltrs nums  logs
1    a    1  TRUE
2    b    2 FALSE
3    c    3  TRUE

The only constraints required to make a data frame are:

  1. Each column (vector) contains the same type of data

  2. The number of observations in each column is equal.

B.5 Getting your data into R

It is the rare case when you manually enter your data in R. Most data analysis workflows typically begin with importing a dataset from an external source. We’ll be using read_excel() function from the readxl package.

We can import the ExampleSites.xlsx dataset as follows. Note the use of a relative file path. You can see what R is using as your “working directory” using the getwd() function.

sitdat <- read_excel("data/ExampleSites.xlsx")

Let’s explore the dataset a bit.

# get the dimensions
dim(sitdat)
[1] 11  5
# get the column names
names(sitdat)
[1] "Monitoring Location ID"        "Monitoring Location Name"     
[3] "Monitoring Location Latitude"  "Monitoring Location Longitude"
[5] "Location Group"               
# see the first six rows
head(sitdat)
# A tibble: 6 × 5
  `Monitoring Location ID` `Monitoring Location Name` Monitoring Location Lati…¹
  <chr>                    <chr>                                           <dbl>
1 ABT-026                  Rte 2, Concord                                   42.5
2 ABT-062                  Rte 62, Acton                                    42.4
3 ABT-077                  Rte 27/USGS, Maynard                             42.4
4 ABT-144                  Rte 62, Stow                                     42.4
5 ABT-237                  Robin Hill Rd, Marlboro                          42.3
6 ABT-301                  Rte 9, Westboro                                  42.3
# ℹ abbreviated name: ¹​`Monitoring Location Latitude`
# ℹ 2 more variables: `Monitoring Location Longitude` <dbl>,
#   `Location Group` <chr>
# get the overall structure
str(sitdat)
tibble [11 × 5] (S3: tbl_df/tbl/data.frame)
 $ Monitoring Location ID       : chr [1:11] "ABT-026" "ABT-062" "ABT-077" "ABT-144" ...
 $ Monitoring Location Name     : chr [1:11] "Rte 2, Concord" "Rte 62, Acton" "Rte 27/USGS, Maynard" "Rte 62, Stow" ...
 $ Monitoring Location Latitude : num [1:11] 42.5 42.4 42.4 42.4 42.3 ...
 $ Monitoring Location Longitude: num [1:11] -71.4 -71.4 -71.4 -71.5 -71.6 ...
 $ Location Group               : chr [1:11] "Assabet" "Assabet" "Assabet" "Assabet" ...

You can also view a dataset in a spreadsheet style using the View() function:

View(sitdat)

B.6 Summary

In this intro we learned about R and Rstudio, some of the basic syntax and data structures in R, and how to import files. You’ll be able to follow the rest of the workshop with this knowledge. View the Resources page for additional training materials.