4 Working with R and RStudio

This section include a tutorial on some data analysis topics that will be helpful for this class. Please make sure you’ve installed R and RStudio, and some essential packages according to this guide. You can skip installing git for now.

4.1 Introduction to RStudio

RStudio is organized into four panels: scripts & documents (upper left), the console (lower left), and two utility panels on the right that have various helpful functions. You may not see a document window on the left; if so, hit Ctrl+Shift+N to create a blank script. (Note that on Macs, Ctrl should pretty much always be replaced with Cmd). Generally, you write all of your code in the document window, then hit Ctrl+Enter (Cmd+Return) to send the R statement your cursor is on to the console window.

4.1.1 RStudio Projects

RStudio’s most useful feature is R projects, which automatically manage a lot of the tedious things that can normally cause problems when working with R. Most importantly, they make it easier to keep track of where your script, data, and output files are.

Let’s create a project for Bio 373L. In the upper right corner, click the down arrow next to Project: (None) (if it says something else, that’s fine), then select New Project…. Select New Directory -> New Project, then give it a name relevant to the class. I’d recommend making the project a subdirectory of where you keep the rest of your 373L files. Create the project; RStudio will take a moment to reset, then the upper-right corner of the screen should have the name of your project. Always make sure the project is loaded when you’re working on material for this class; if it isn’t, just click the project drop-down arrow and select it from the list.

Now that your project is loaded, click the Files tab in one of the two utility panels; this shows everything in the project directory. All of the script & data files you work should be put in this directory. I recommend creating a folder for each lab (click the New Folder button) to keep things organized. If you want to open the folder in Windows Explorer/Finder/whateverfile manager you use, click the More button, then Show Folder in New Window.

4.1.2 Customizing RStudio

I have a couple of recommendations for customizing RStudio to make it easier to use. Go to Tools -> Global Options.

Under the General tab, make sure that the Workspace options are unchecked and set to Never; this will make sure you start with a fresh slate every time you start up R and prevent some weird errors from cropping up.

Under Appearance, I’d take a look at some of the Editor themes. I’m rather fond of Vibrant Ink.

Under Pane Layout, you can reorganize your panels. I like to put the Console in the lower right, and make sure that the lower left pane contains only History, Connections, Packages, and Tutorial. I generally find those four functions to be generally useless, so I can keep that pane minimized and have a larger document window on the left.

4.2 R basics

4.2.1 Statements & vectors

Let’s take a look at some R basics. First, R can be used as an excessively fancy calculator. The following block contains R expressions, followed by the results of running them in the console (preceded with ##). Try running it yourself.

(4^2 + 8)/10
## [1] 2.4
log(5) + 12
## [1] 13.60944
sqrt(abs(-20))
## [1] 4.472136

R works naturally with vectors of numbers (or text).

1:10 # Create a sequence of numbers
##  [1]  1  2  3  4  5  6  7  8  9 10
c(1, 4, 9, 12, 98.7) # use c() to make a vector
## [1]  1.0  4.0  9.0 12.0 98.7
c("A", "B", "C", "D") # Here's a character vector
## [1] "A" "B" "C" "D"

#  Most operations work with vectors
(1:10) + 2
##  [1]  3  4  5  6  7  8  9 10 11 12
(1:5) + c(10, 20, 30, 40, 50)
## [1] 11 22 33 44 55

# Vectors can only be of one type; mixing numbers & text will convert them all to text
c("I have been at UT for ", 5, "Years")
## [1] "I have been at UT for " "5"                      "Years"

Note that anything following a # is a comment, and ignored by R. I highly advise using comments to document your code.

4.2.2 Variables

You can save values & objects by creating variables.

# You can use either <- or = to assign a variable
first_ten <- 1:10
second_ten = 11:20 

# Run the variable's name to see it's value (this is callled printing)
first_ten
##  [1]  1  2  3  4  5  6  7  8  9 10
second_ten
##  [1] 11 12 13 14 15 16 17 18 19 20

# You can use variables just like you would use their values
first_ten + 1 
##  [1]  2  3  4  5  6  7  8  9 10 11
first_ten + second_ten
##  [1] 12 14 16 18 20 22 24 26 28 30

# Note that variable names are case-sensitive
first_Ten # doesn't work
## Error in eval(expr, envir, enclos): object 'first_Ten' not found

4.2.3 Reading & working with data frames

Before we get started with this you’ll need to download an example data file. From RStudio, create an example_data directory, then save this file in it (make sure the name is still anoles.csv). Note that you may need to go to go to **File -> Save Page As…* (or some variant) in your web browser to save it.

Now, let’s load the data into R. To do that, we need to load the readr package, which is part of the tidyverse. We will be using the read_csv() function. Note that there’s also a read.csv() function; don’t use that one, it has a tendency to change the column names of your data. RStudio also has some built-in ways to load datasets; I would strongly advise not using them, because it makes it harder to go back & repeat your analysis if something changes.

library(tidyverse) 
lizards <- read_csv("example_data/anoles.csv") # Note that the path is relative to your project directory.

This is a data frame (effectively a spreadsheet). Technically, it’s a type of data frame called a tibble, which doesn’t really matter for what we’re doing right now. Let’s take a look at it:

# quick view of data frame; note that there's more columns and rows 
lizards # listed than are displayed
## # A tibble: 657 × 9
##    Site  Color_morph  Limb  Mass Diameter Height   SVL  Tail Perch_type
##    <chr> <chr>       <dbl> <dbl>    <dbl>  <dbl> <dbl> <dbl> <chr>     
##  1 A     Green        14.3  6.46        8    164  61.8  43.9 Other     
##  2 A     Brown        12.3  5.82       18    151  57.1  42.2 Tree      
##  3 A     Blue         10.5  4.29       36    130  49.1  25   Building  
##  4 A     Brown        10.3  5.29       31    131  51.2  38.2 Tree      
##  5 A     Brown        10.9  5.69       20    138  51.5  46.9 Shrub     
##  6 A     Brown        10.4  5.84       25    137  45.3  59   Shrub     
##  7 A     Green        11.1  5.91        7    138  49.7  47.6 Building  
##  8 A     Brown        10    5.09       20    141  48    34.9 Tree      
##  9 A     Brown        12.3  7.2        19    129  54.9  61   Tree      
## 10 A     Brown        11.2  6.66       15    134  52.7  50.4 Tree      
## # … with 647 more rows
## # ℹ Use `print(n = ...)` to see more rows
# Look at the first few rows of each column
glimpse(lizards)
## Rows: 657
## Columns: 9
## $ Site        <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"…
## $ Color_morph <chr> "Green", "Brown", "Blue", "Brown", "Brown", "Brown", "Gree…
## $ Limb        <dbl> 14.3, 12.3, 10.5, 10.3, 10.9, 10.4, 11.1, 10.0, 12.3, 11.2…
## $ Mass        <dbl> 6.46, 5.82, 4.29, 5.29, 5.69, 5.84, 5.91, 5.09, 7.20, 6.66…
## $ Diameter    <dbl> 8, 18, 36, 31, 20, 25, 7, 20, 19, 15, 31, 35, 10, 19, 30, …
## $ Height      <dbl> 164, 151, 130, 131, 138, 137, 138, 141, 129, 134, 143, 150…
## $ SVL         <dbl> 61.8, 57.1, 49.1, 51.2, 51.5, 45.3, 49.7, 48.0, 54.9, 52.7…
## $ Tail        <dbl> 43.9, 42.2, 25.0, 38.2, 46.9, 59.0, 47.6, 34.9, 61.0, 50.4…
## $ Perch_type  <chr> "Other", "Tree", "Building", "Tree", "Shrub", "Shrub", "Bu…
# View the data in an RStudio pane
View(lizards)

Each column of the data frame is a vector of the same length. We can pull our columns and work with them directly:

# Let's extract the color column
lizards$Color_morph
lizards[["Color_morph"]] 
pull(lizards, Color_morph) # requires dplyr package, which is in the tidyverse
# Note that some of these require quotes, some of them don't; this is 
# I haven't included output here, because it's rather long

4.2.4 Functions

Pretty much everything that isn’t data is a function. Some of the examples we’ve used include log, abs, read_csv, and mean. Most functions have arguments, which tell the function what to work with. For example:

mean(x = 1:5) # mean of 1 through 5
## [1] 3
sd(x = lizards$Mass) # standard deviation of lizard mass
## [1] 1.088521

Functions can have multiple arguments; for example log has the arguments x and base. Arguments can be matched by name or by their position. Some arguments have default values that are used if the argument isn’t provided.

log(x = 1:5) # argument is matched by name; base uses it's default value
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379
log(1:5, base = 10) # specifies a base; this overrides the default
## [1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700
log(1:5, 10) # same as above, but matched by position
## [1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700

4.2.5 Getting Help

R has a built-in help system to look up functions, their arguments, and what they do:

?read_csv
?mean
?log