4 Working with R and RStudio
This section include a tutorial on some data analysis topics that will be helpful for this class. Please make sure you’ve installed R and RStudio, and some essential packages according to this guide. You can skip installing git for now.
4.1 Introduction to RStudio
RStudio is organized into four panels: scripts & documents (upper left), the console (lower left), and two utility panels on the right that have various helpful functions. You may not see a document window on the left; if so, hit Ctrl+Shift+N to create a blank script. (Note that on Macs, Ctrl should pretty much always be replaced with Cmd). Generally, you write all of your code in the document window, then hit Ctrl+Enter (Cmd+Return) to send the R statement your cursor is on to the console window.
4.1.1 RStudio Projects
RStudio’s most useful feature is R projects, which automatically manage a lot of the tedious things that can normally cause problems when working with R. Most importantly, they make it easier to keep track of where your script, data, and output files are.
Let’s create a project for Bio 373L. In the upper right corner, click the down arrow next to Project: (None) (if it says something else, that’s fine), then select New Project…. Select New Directory -> New Project, then give it a name relevant to the class. I’d recommend making the project a subdirectory of where you keep the rest of your 373L files. Create the project; RStudio will take a moment to reset, then the upper-right corner of the screen should have the name of your project. Always make sure the project is loaded when you’re working on material for this class; if it isn’t, just click the project drop-down arrow and select it from the list.
Now that your project is loaded, click the Files tab in one of the two utility panels; this shows everything in the project directory. All of the script & data files you work should be put in this directory. I recommend creating a folder for each lab (click the New Folder button) to keep things organized. If you want to open the folder in Windows Explorer/Finder/whateverfile manager you use, click the More button, then Show Folder in New Window.
4.1.2 Customizing RStudio
I have a couple of recommendations for customizing RStudio to make it easier to use. Go to Tools -> Global Options.
Under the General tab, make sure that the Workspace options are unchecked and set to Never; this will make sure you start with a fresh slate every time you start up R and prevent some weird errors from cropping up.
Under Appearance, I’d take a look at some of the Editor themes. I’m rather fond of Vibrant Ink.
Under Pane Layout, you can reorganize your panels. I like to put the Console in the lower right, and make sure that the lower left pane contains only History, Connections, Packages, and Tutorial. I generally find those four functions to be generally useless, so I can keep that pane minimized and have a larger document window on the left.
4.2 R basics
4.2.1 Statements & vectors
Let’s take a look at some R basics. First, R can be used as an excessively fancy calculator. The following block contains R expressions, followed by the results of running them in the console (preceded with ##
). Try running it yourself.
4^2 + 8)/10
(## [1] 2.4
log(5) + 12
## [1] 13.60944
sqrt(abs(-20))
## [1] 4.472136
R works naturally with vectors of numbers (or text).
1:10 # Create a sequence of numbers
## [1] 1 2 3 4 5 6 7 8 9 10
c(1, 4, 9, 12, 98.7) # use c() to make a vector
## [1] 1.0 4.0 9.0 12.0 98.7
c("A", "B", "C", "D") # Here's a character vector
## [1] "A" "B" "C" "D"
# Most operations work with vectors
1:10) + 2
(## [1] 3 4 5 6 7 8 9 10 11 12
1:5) + c(10, 20, 30, 40, 50)
(## [1] 11 22 33 44 55
# Vectors can only be of one type; mixing numbers & text will convert them all to text
c("I have been at UT for ", 5, "Years")
## [1] "I have been at UT for " "5" "Years"
Note that anything following a #
is a comment, and ignored by R. I highly advise using comments to document your code.
4.2.2 Variables
You can save values & objects by creating variables.
# You can use either <- or = to assign a variable
<- 1:10
first_ten = 11:20
second_ten
# Run the variable's name to see it's value (this is callled printing)
first_ten## [1] 1 2 3 4 5 6 7 8 9 10
second_ten## [1] 11 12 13 14 15 16 17 18 19 20
# You can use variables just like you would use their values
+ 1
first_ten ## [1] 2 3 4 5 6 7 8 9 10 11
+ second_ten
first_ten ## [1] 12 14 16 18 20 22 24 26 28 30
# Note that variable names are case-sensitive
# doesn't work
first_Ten ## Error in eval(expr, envir, enclos): object 'first_Ten' not found
4.2.3 Reading & working with data frames
Before we get started with this you’ll need to download an example data file. From RStudio, create an example_data directory, then save this file in it (make sure the name is still anoles.csv
). Note that you may need to go to go to **File -> Save Page As…* (or some variant) in your web browser to save it.
Now, let’s load the data into R.
To do that, we need to load the readr
package, which is part of the tidyverse
.
We will be using the read_csv()
function.
Note that there’s also a read.csv()
function; don’t use that one, it has a tendency to change the column names of your data.
RStudio also has some built-in ways to load datasets; I would strongly advise not using them, because it makes it harder to go back & repeat your analysis if something changes.
library(tidyverse)
<- read_csv("example_data/anoles.csv") # Note that the path is relative to your project directory. lizards
This is a data frame (effectively a spreadsheet). Technically, it’s a type of data frame called a tibble
, which doesn’t really matter for what we’re doing right now. Let’s take a look at it:
# quick view of data frame; note that there's more columns and rows
# listed than are displayed lizards
## # A tibble: 657 × 9
## Site Color_morph Limb Mass Diameter Height SVL Tail Perch_type
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 A Green 14.3 6.46 8 164 61.8 43.9 Other
## 2 A Brown 12.3 5.82 18 151 57.1 42.2 Tree
## 3 A Blue 10.5 4.29 36 130 49.1 25 Building
## 4 A Brown 10.3 5.29 31 131 51.2 38.2 Tree
## 5 A Brown 10.9 5.69 20 138 51.5 46.9 Shrub
## 6 A Brown 10.4 5.84 25 137 45.3 59 Shrub
## 7 A Green 11.1 5.91 7 138 49.7 47.6 Building
## 8 A Brown 10 5.09 20 141 48 34.9 Tree
## 9 A Brown 12.3 7.2 19 129 54.9 61 Tree
## 10 A Brown 11.2 6.66 15 134 52.7 50.4 Tree
## # … with 647 more rows
## # ℹ Use `print(n = ...)` to see more rows
# Look at the first few rows of each column
glimpse(lizards)
## Rows: 657
## Columns: 9
## $ Site <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"…
## $ Color_morph <chr> "Green", "Brown", "Blue", "Brown", "Brown", "Brown", "Gree…
## $ Limb <dbl> 14.3, 12.3, 10.5, 10.3, 10.9, 10.4, 11.1, 10.0, 12.3, 11.2…
## $ Mass <dbl> 6.46, 5.82, 4.29, 5.29, 5.69, 5.84, 5.91, 5.09, 7.20, 6.66…
## $ Diameter <dbl> 8, 18, 36, 31, 20, 25, 7, 20, 19, 15, 31, 35, 10, 19, 30, …
## $ Height <dbl> 164, 151, 130, 131, 138, 137, 138, 141, 129, 134, 143, 150…
## $ SVL <dbl> 61.8, 57.1, 49.1, 51.2, 51.5, 45.3, 49.7, 48.0, 54.9, 52.7…
## $ Tail <dbl> 43.9, 42.2, 25.0, 38.2, 46.9, 59.0, 47.6, 34.9, 61.0, 50.4…
## $ Perch_type <chr> "Other", "Tree", "Building", "Tree", "Shrub", "Shrub", "Bu…
# View the data in an RStudio pane
View(lizards)
Each column of the data frame is a vector of the same length. We can pull our columns and work with them directly:
# Let's extract the color column
$Color_morph
lizards"Color_morph"]]
lizards[[pull(lizards, Color_morph) # requires dplyr package, which is in the tidyverse
# Note that some of these require quotes, some of them don't; this is
# I haven't included output here, because it's rather long
4.2.4 Functions
Pretty much everything that isn’t data is a function. Some of the examples we’ve used include log
, abs
, read_csv
, and mean
. Most functions have arguments, which tell the function what to work with. For example:
mean(x = 1:5) # mean of 1 through 5
## [1] 3
sd(x = lizards$Mass) # standard deviation of lizard mass
## [1] 1.088521
Functions can have multiple arguments; for example log
has the arguments x
and base
. Arguments can be matched by name or by their position. Some arguments have default values that are used if the argument isn’t provided.
log(x = 1:5) # argument is matched by name; base uses it's default value
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379
log(1:5, base = 10) # specifies a base; this overrides the default
## [1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700
log(1:5, 10) # same as above, but matched by position
## [1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700