Appendix B — Introduction to R

B.1 RStudio

RStudio is the go-to Interactive Development Environment (IDE) for R. Rstudio includes many features to improve the user’s experience.

Let’s get familiar with RStudio.

B.1.1 Open R and RStudio

Find the RStudio shortcut on your computer and fire it up. You should see something like this:

There are four panes in RStudio:

  • Source: Your primary window for writing code to send to the console, this is where you write and save R “scripts”
  • Console: This is where code is executed in R
  • Environment, History, etc.: A tabbed window showing your working environment, code execution history, and other useful things
  • Files, plots, etc.: A tabbed window showing a file explorer, a plot window, list of installed packages, help files, and viewer

B.1.2 Scripting

In most cases, you will not enter and execute code directly in the console. Code can be written in a script and then sent directly to the console.

Open a new script from the File menu…

B.1.3 Executing code in RStudio

After you write code in an R script, it can be sent to the Console to run the code. There are two ways to do this. First, you can hit the Run button at the top right of the scripting window. Second, you can use ctrl+enter (cmd+enter on a Mac). Either option will run the line(s) of script that are selected.

B.2 R language fundamentals

R is built around functions. The basic syntax of a function follows the form: function_name(arg1, arg2, ...).

With the base install, you will gain access to many functions (2344, to be exact). Some examples:

# print
print("hello world!")
[1] "hello world!"
# sequence
seq(1, 10)
 [1]  1  2  3  4  5  6  7  8  9 10
# random numbers
rnorm(100, mean = 10, sd = 2)
  [1] 11.778339  8.168308 10.328107  6.389375 11.027014 11.422331  9.862396
  [8]  6.311983  9.129494 12.465959  6.664568  7.144092  9.338055 15.140616
 [15]  8.320478  7.762323 10.646784 11.620608  9.036408 14.022949 10.611767
 [22] 11.923009  6.893529  8.686084 10.709322  8.195416  8.502059 12.344259
 [29]  7.091856  8.315045 11.694358  8.039414 13.937610  9.713869 11.917939
 [36] 13.739459  8.559499 11.533641  9.074324 11.069826  7.910155 11.547564
 [43]  9.516735  9.910346 11.341098  8.295156  8.903312  9.369067 10.246925
 [50]  9.195306  8.714490 11.389991 11.509403 12.551401  9.968140  6.985595
 [57] 12.180542 11.877617  9.300282 13.222713 12.094793 11.507085  7.372819
 [64]  9.755827  9.720811  9.567607  9.764642 12.982514 10.496867 10.015366
 [71] 10.497256 12.238960 10.978105 11.598452 11.792907  6.690698  9.311789
 [78]  8.133298  9.293863  9.866241  9.461834 11.903913  9.162688 10.453227
 [85]  5.824510 10.282171  8.072022  5.698184 13.425251  9.034962 14.954518
 [92]  8.884073 13.029157  8.494096  8.557368 11.340525  6.520424 10.502925
 [99]  9.217062 10.108264
# average 
mean(rnorm(100))
[1] -0.0340971
# sum
sum(rnorm(100))
[1] -4.618643

Very often you will see functions used like this:

my_random_sum <- sum(rnorm(100))

The first part of the line is the name of an object that you make up. The second bit, <-, is the assignment operator. This tells R to take the result of sum(rnorm(100)) and store it in an object named, my_random_sum. It is stored in the environment and can be used by just executing it’s name in the console.

my_random_sum
[1] -0.389211

B.2.1 What is the environment?

There are two outcomes when you run code. First, the code will simply print output directly in the console. Second, there is no output because you have stored it as a variable using <-. Output that is stored is saved in the environment. The environment is the collection of named objects that are stored in memory for your current R session.

B.3 Packages

The base installation of R is quite powerful. Packages allow you to include new methods for use in R.

B.3.1 CRAN

Many packages are available on CRAN, The Comprehensive R Archive Network. This is where you download R and also where most will gain access to packages. As of 2023-11-13, there are 20015 packages on CRAN!

B.3.2 Installing packages

When a package gets installed, that means the source code is downloaded and put into your library. A default library location is set for you.

We use the install.packages() function to download and install a package. Here, we install the readxl package, used below, which is used to upload data from and Excel file.

install.packages("readxl")

You should see some text in the R console showing progress of the installation and a prompt after installation is done.

After installation, you can load a package using the library() function. This makes all functions in a package available for you to use.

library(readxl)

An important aspect of packages is that you only need to download them once, but every time you start RStudio you need to load them with the library() function.

B.4 Data structures in R

Now we can talk about R data structures. Simply put, a data structure is a way for programming languages to handle information storage.

B.4.1 Vectors (one-dimensional data)

The basic data format in R is a vector - a one-dimensional grouping of elements that have the same type. These are all vectors and they are created with the c (concatenate) function:

dbl_var <- c(1, 2.5, 4.5)
int_var <- c(1L, 6L, 10L)
log_var <- c(TRUE, FALSE, T, F)
chr_var <- c("a", "b", "c")

The four types of vectors are double (or numeric), integer, logical, and character. The following functions can return useful information about the vectors:

class(dbl_var)
[1] "numeric"
length(log_var)
[1] 4

B.4.2 Data frames (two-dimensional data)

A collection of vectors represented as one data object are often described as two-dimensional data, like a spreadsheet, or in R speak, a data frame. Here’s a simple example:

ltrs <- c("a", "b", "c")
nums <- c(1, 2, 3)
logs <- c(T, F, T)
mydf <- data.frame(ltrs, nums, logs)
mydf
  ltrs nums  logs
1    a    1  TRUE
2    b    2 FALSE
3    c    3  TRUE

The only constraints required to make a data frame are:

  1. Each column (vector) contains the same type of data

  2. The number of observations in each column is equal.

B.5 Getting your data into R

It is the rare case when you manually enter your data in R. Most data analysis workflows typically begin with importing a dataset from an external source. We’ll be using read_excel() function from the readxl package.

We can import the ExampleSites.xlsx dataset as follows. Note the use of a relative file path. You can see what R is using as your “working directory” using the getwd() function.

sitdat <- read_excel("data/ExampleSites.xlsx")

Let’s explore the dataset a bit.

# get the dimensions
dim(sitdat)
[1] 11  5
# get the column names
names(sitdat)
[1] "Monitoring Location ID"        "Monitoring Location Name"     
[3] "Monitoring Location Latitude"  "Monitoring Location Longitude"
[5] "Location Group"               
# see the first six rows
head(sitdat)
# A tibble: 6 × 5
  `Monitoring Location ID` `Monitoring Location Name` Monitoring Location Lati…¹
  <chr>                    <chr>                                           <dbl>
1 ABT-026                  Rte 2, Concord                                   42.5
2 ABT-062                  Rte 62, Acton                                    42.4
3 ABT-077                  Rte 27/USGS, Maynard                             42.4
4 ABT-144                  Rte 62, Stow                                     42.4
5 ABT-237                  Robin Hill Rd, Marlboro                          42.3
6 ABT-301                  Rte 9, Westboro                                  42.3
# ℹ abbreviated name: ¹​`Monitoring Location Latitude`
# ℹ 2 more variables: `Monitoring Location Longitude` <dbl>,
#   `Location Group` <chr>
# get the overall structure
str(sitdat)
tibble [11 × 5] (S3: tbl_df/tbl/data.frame)
 $ Monitoring Location ID       : chr [1:11] "ABT-026" "ABT-062" "ABT-077" "ABT-144" ...
 $ Monitoring Location Name     : chr [1:11] "Rte 2, Concord" "Rte 62, Acton" "Rte 27/USGS, Maynard" "Rte 62, Stow" ...
 $ Monitoring Location Latitude : num [1:11] 42.5 42.4 42.4 42.4 42.3 ...
 $ Monitoring Location Longitude: num [1:11] -71.4 -71.4 -71.4 -71.5 -71.6 ...
 $ Location Group               : chr [1:11] "Assabet" "Assabet" "Assabet" "Assabet" ...

You can also view a dataset in a spreadsheet style using the View() function:

View(sitdat)

B.6 Summary

In this intro we learned about R and Rstudio, some of the basic syntax and data structures in R, and how to import files. You’ll be able to follow the rest of the workshop with this knowledge.