By the end of this lesson, you should be able to:
This semester, we will be using R to perform data manipulation, analysis, and visualization
R is a free, open source computing software used for statistical analysis and graphing. R is also a computing language. This means that R software interprets the R language, which is a specific set of vocabulary, formatting, and structure, to perform specific tasks. Other coding languages are Java, C++, python, etc.
Rstudio is an interface that runs the R software in a user friendly format. Rstudio allows the user to run R code, take notes, view graphs, and view files in one window.
In this course, we will be using Rstudio through the Posit Cloud website. This is a browser based way to use Rstudio. This allows you to access your R projects on any computer. This will be convenient for this class because you are able to work on projects in class and then go home and access your work on your on computer by logging in to your account.
You should already have a posit cloud account and should have received an email with a link to our class workspace. Follow the email link and log in using your posit cloud account.
In Rstudio, you can set up an R project which is a folder on your computer. In Posit Cloud, all work is organized in projects. A project is basically a folder where all your work is contained, including data files, notes, etc.
We will learn how to make a project in Posit Cloud later. For now, open the Learn R project. You will complete the first few lessons here.
When you open the project, you should see three main panes.
The next few sections will go through common tasks performed in R, which are needed to complete analyses.
R can perform basic mathematical calculations. To compute a calculations, simply type it in the Console. You can do addition (+), subtraction (-), multiplication (*), division (/), power (^), to name a few. You can also use parentheses as usual in mathematical equations.
For example for 5 times 2:
5 * 2
## [1] 10
On your own, try some mathematical calculations in the Console.
While you can run code in the console, it is a much better practice to create an R script. This allows you to save your scripts so that you know what you did in an analysis and makes your science reproducible. In the script, you can take notes, write code, and run code all from the same file. You can also save the file and re-run or go back and edit later.
Open up a new script by going to File –> New File –> R script. A script file will open above the console.
All of your code should be typed in the script. To run the code from the notebook, but your cursor on the line that you want to run. Then click “Ctrl + enter” on a Windows or Linux machine or ‘Cmd + enter’ on a Mac.
To add notes, add a ‘#’ to the beginning of a line. This line will not be run in R. For example:
# I am doing math
5 * 10
## [1] 50
On your own, try out some math in a script. Add some notes as well.
Most tasks in R will be accomplished using functions. Functions is code that is stored in R and will run a specific task.
Functions are run using the function name followed by the “arguments.” A function can have any number of arguments.
Functions can range from very simple to very complex. Many functions exist in R without doing anything, you can “load” additional functions through “packages”, or you can even make your own functions.
For example, log is a function that will take the
natural log of the value given as an argument, x.
log(x = 1.5)
## [1] 0.4054651
The log function can also take an addition argument
which sets the base that the logarithms are computed. Two arguments are
possible with log, x (or the value) and base. So to take
the base 10 logarithm of the value 1.5, you could do the following:
log(x = 1.5, base = 10)
## [1] 0.1760913
This could be written without the argument names, but you must be very careful to put the arguments in the correct order.
log(1.5, 10)
## [1] 0.1760913
Hint: To get help with a specific function, use ‘?’ followed by the function name.
?log
There are many other mathematical function such as sum, max, min, mean, median, sd (standard deviation).
In R, you can store information (numbers, tables, text, etc.) in
variables. To do this, a variable is assigned a name using the
characters <-. For example, to assign the letter a the
value 1.5, type:
a <- 1.5
Then, when you run a, you can see the value.
a
## [1] 1.5
You can now treat a as 1.5. For example, you can take
the log of a, which is the same as running
log(1.5)
log(a)
## [1] 0.4054651
Note R is case sensitive! So be aware of your capitalization. “a” and “A” are not the same!
So far, we have only dealt with single numbers, but R can handle numbers or text. With numbers, you can type them in regularly and R will recognize them as such. Text is put into quotations (““). You can use single(’) or double (”) in R. Without the quotation mark R will try to find a variable of the same name.
For example, instead of a number, you can store text in a variable:
b <- "hello"
Then when your run the variable b, you get the text as
the output.
b
## [1] "hello"
We can try out using text with the say function, giving
arguments for the text and the animal.
library(cowsay)
say(b, 'owl')
##
## _______
## < hello >
## -------
## \
## \
##
## /\___/\
## {o}{o}|
## \ v /|
## | \ \
## \___/_/ [ab]
## | |
There are other types of data, but numbers (numeric) and text (characters) are the two main types we will use in this class.
Data can be stored in different data structures including vectors, data frames, and matrices.
Vectors are a one dimensional way to put together
data of the same type (i.e. all text or all characters). The
most common way to create a vector is by using the c()
function, which concatenates, or combines, values into a vector.
Here, a vector with the numbers 5, 10, 20 are made into a vector named “example.vector”
example_vector <- c(5, 10, 20)
example_vector
## [1] 5 10 20
Note Do not use spaces in your names. If you would like to separate words you can use “_” or “-”
You can also perform mathematical calculations or functions with vectors. The function is performed on each entry in the vector
example_vector / 5
## [1] 1 2 4
Vectors can also store character data.
rainbow_colors <- c('red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet')
rainbow_colors
## [1] "red" "orange" "yellow" "green" "blue" "indigo" "violet"
You can also run a function with a vector. For example, the function
sum takes the sum of all numbers.
sum(example_vector)
## [1] 35
You can also use functions on vectors made with characters. For
example, here I will use the function length to see how
long the character vector rainbow_colors is.
length(rainbow_colors)
## [1] 7
In this activity, students will assign data to a vector in R and make a quick graph of the results. Write all of your code in an R script.
roll.hist. hist takes the argument for a vector of
numbers, just like you have in roll.What do you think of the distribution that you got? Is this expected?
You will want to save your script periodically to save your work. To save your script, go to File and click save.
When you save your script, it will save as a “.R” file in your project.
Remember, it is best not to use spaces in your names to help with file management.
Data frames are data structures with rows and columns, like a table. Each column is essentially a vector, so must have the same data type (i.e. numbers or characters). However, a data frame can consists of columns with different data types. This is widely used for storing data because you can have a column with the treatment name (text) and then a column of the measurement (number).
A data frame might look something like this
## treatment growth
## 1 control 5
## 2 control 8
## 3 control 6
## 4 high N 1
## 5 high N 8
## 6 high N 9
## 7 low N 5
## 8 low N 2
## 9 low N 3
A matrix is similar to a data frame except that all data in a matrix are the same type (i.e. all characters or all numbers).
Data frames most common data structure that you will be working with. For the most part, you will be importing data into R as a data frame from a .csv file.
The main way that you will be adding data to R is using the
read.csv function. This function takes the file name in
quotes as an argument, file. It also has the argument header. This
defaults to TRUE. If you have a header you can leave it. If not header
should be FALSE. Otherwise the first row of data will be make into the
header. Most of the time your data will have a header row.
When you use are reading in data files, R will look for the file in the folder or “directory” you are in. If the file is not in the home folder, you will need to specify where R should look. Folders in names are separated by “/”. So, for example the directory “Documents/bio_files/” is a folder “bio_files” inside the folder “Documents.” “Documents/bio_files/photosynthesis.pdf” is a PDF file inside the bio_files folder.
Here, I have a file, “fly_data.csv” in a directory named “data”. I can load it in an assign it fly as follows:
fly<-read.csv(file = 'data/fly_data.csv', header = TRUE)
To check the data, it doesn’t always make sense to show the whole
data frame, especially if it is very large. I can use the function
head to only show the first 10 rows.
head(fly)
## species site n
## 1 boycei Chiricahuas 1
## 2 boycei Chiricahuas West 3 1
## 3 boycei Chiricahuas cliff 7
## 4 boycei Chiricahuas-2078 1
## 5 boycei Iron Creek Campsite, Gila, NM 1
## 6 boycei McNarey, AZ 1
Now that we have a data frame read into R, we can use functions to explore the data.
In data frames, you can also specify the column using “$” and the name of the column. This allows you to look at the values in one column. When specify just a single column in the last section, you essentially created a vector. You can also use functions on this vector.
For example, let’s just look at the column for the number of flies, n.
fly$n
## [1] 1 1 7 1 1 1 3 1 4 3 3 1 1 1 1 1 2 3 1 5 1 7 1 1 1 1 4 1 4 4 2 1 1 1 1 1 2
## [38] 2 1 3 1 6 1 2 1 2 3 2 1 4 1 1 1 1 4 1 1 9 1 2 1 1 1 2 1 1 5 4 1 1 1 1 2 1
## [75] 2 1 7 1 3 1 3 3 5 2 5 1 1 1 3 1 3 3 1 1 1 1 1 5 1 4 3 1 3 3 1 1 3 2 2 1 1
## [112] 1 1 1 1 1 1 1 1 2 1 1 1 2 3 5 3 3 1 1 1 1 1 3 4 1 4 1 1 1 2 2 1 1 1 1 1 1
## [149] 1 4 1 1 1 1 1 1 3 1 1 1 4 1 1 1 1 3 1 1 6 1 1 2 1 1 1 2 1 1 1 3 3 2 1
In the fly data set, we have a column for the species of fly, the location, and the number of flies collected. One thing that we can do is use mathematic functions, such as sum, max, min, mean, median, sd (standard deviation).
Let’s look at the total number (sum) of all the flies collected.
sum(fly$n)
## [1] 348
We can also use functions of character data. For example, we can use
the function unique to get every distict value for the
species.
unique(fly$species)
## [1] "boycei" "completa" "juglandis" "suavis"
Now, we will look at the data file that you created with your data on paper airplanes.
First, go back to your data. Let’s make sure it is in a format that can be analyzed in R.
Most of the time in class, we will use R project that are set up for the class. For this activity, let’s practice making a new R project so that you know how.
In the last class, you collected data on paper airplanes and input your data table into sheets and saved it as a .csv file. A csv file is a file where each column is separated by commas. It is important to make sure that your table is formatted in a way that R will read it correctly.
Check your data for these common mistakes:
Question: Look back at your airplane data! Fix any issues and have it checked by Dr. Glover before moving on..
There is one extra step to reading in data files when using posit cloud. You have to import the data into posit cloud. You can do this in the lower right hand box with the upload button. Upload your class file into your R project. Hint: when you download from google sheets, it should go to the downloads folder.
Activity: Download your airplane data from Google Sheets as a .csv file and add to your R project..
Play around with your airplane data! What can you discover about the data?
Hint What is the average distance the airplanes
flew? (You can use the mean function) The max distance?
Complete these exercises in the Learning R project. For the exercises, you should do all of your code in an R script. You can use the one you started in class or create a new one. You will then export the R script and submit in the Moodle assignment.
In this example, you will simulate rolling dice using the
sample function. This function generates random numbers
from a given set of numbers. Let’s imagine we roll the dice 100
times.
sample(c(1, 2, 3, 4, 5, 6), 100, replace = TRUE). This
gives 100 values from 1 to 6. replace = TRUE means that you
can get a number more than once. Make sure you assign the function to a
name!Here, you will use a data set that contains information about climate data in Raleigh.
read.csv to read the file
“raleigh_prism_climate.csv”. It is in the data folder.When you have completed the homework assignment. Export both the R script you used in class for your airplane data and the one you used for the homework.
Submit the .R files in Moodle. You will not be able to preview the files in Moodle, but submit them anyway! I will be able to view them just fine.
logcsumlengthuniqueread.csvheadsummax and min