In this practice, we will work with a dataset that comes with
ggplot2 package.
To load this data, run the following code.
library(ggplot2)
data(mpg)
You can now see the top few lines using the head()
function
head(mpg)
## # A tibble: 6 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
This dataset include fuel economy data for 38 different car models in 1999 and 2008. We will work with this dataset in the following this practice set.
In this activity, I will provide hints throughout. Only use the hints if you need them! Try to see what you can do without them first.
colnames()
colnames(mpg)
nrow()
nrow(mpg)
hwy column using
the $ sign. The function for average is mean()
mean(mpg$hwy)
cty column using
the $ sign. The function for average is min()
min(mpg$cty)
dplyr functionsWhat are the different car classes that are in the dataset?
Hint1
You will need to load in the dplyr package before you
can use any of the functions. library(dplyr)
Hint2
You can use the distinct function on the column
class.
Answer
mpg |>
distinct(class)
Select the columns for just the manufacturer, model, year, and highway mpg.
Hint
Use the select() function and type the column names
exactly as they appear in colnames()
Answer
mpg |>
select(manufacturer, model, year, hwy)
How many of the cars in the dataset are suvs?
Hint
You need use the filter() function to show only suvs
from the class column. You can also add the
nrow() function to get the number of rows.
Answer
mpg |>
filter(class == 'suv')
mpg |>
filter(class == 'suv')|>
nrow()
How many Hondas were there in 2008?
Hint
You will have to filter for both honda and 2008.
Answer
mpg|>
filter(manufacturer== 'honda', year == '2008')
How many Hondas or Toyotas were there in 1999?
Hint
You will have to filter for honda or toyota using the “|” symbol between your filtering for manufacturer and filter the year for 1999.
Answer
mpg |>
filter(manufacturer == 'honda' | manufacturer == 'toyota', year == 1999)
What car has the lowest highway mpg?
Hint
Use the arrange() function on the hwy
column.
Answer
mpg |>
arrange(hwy)
What car has the highest city mpg?
Hint
You will arrange() based on the cty column. To go from
greatest to least you can put a - sign in front of the column.
You can also add head() to just show the first few
rows
Answer
mpg |>
arrange(-cty)
mpg |>
arrange(-cty) |>
head()
What car had the highest highway mpg in 1999?
Hint
You will arrange() based on the hwy column. To go from
greatest to least you can put a - sign in front of the column. You also
need to filter() for 1999.
Answer
mpg |>
arrange(-hwy) |>
filter(year == 1999)
Mutate the table to make a new column with the difference of highway mpg and city mpg.
Hint
You will use the mutate() function which expects a new
column name = the calculation from the columns in the data frame.
i.e. mutate(NEWCOL = COL1 + COL2).
Answer
mpg |>
mutate(diff = hwy - cty)
ggplot graphingggplot(data, aes (x = ). Then add the
geom_histogram() layer.
ggplot(mpg, aes(x = cty)) +
geom_histogram()
hwy column
instead of cty
ggplot(mpg, aes(x = hwy)) +
geom_histogram()
ggplot(data, aes (x = , y = ). Then add the
geom_boxplot() layer.
ggplot(mpg, aes(x = class, y = cty)) +
geom_boxplot()
hwy instead of cty.
ggplot(mpg, aes(x = class, y = hwy)) +
geom_boxplot()
ggplot(mpg, aes(x = class, y = hwy)) +
geom_point(aes(color = manufacturer))
displ column) on the x axis and the city gas
mileage on the y axis.
displ and cty,
respectively
ggplot(mpg, aes(x = displ, y = cty)) +
geom_point()
color = into the
geom_point function. Change it to any color you would like.
ggplot(mpg, aes(x = displ, y = cty)) +
geom_point(color = 'slateblue')
theme_minimal().
ggplot(mpg, aes(x = displ, y = cty)) +
geom_point(color = 'slateblue') +
theme_bw()
labs() to add an x
and y label. Label names should be in quotations.
ggplot(mpg, aes(x = displ, y = cty)) +
geom_point(color = 'slateblue')+
theme_bw()+
labs(x = 'Engine Size', y = 'City Gas Mileage')
labs() to add a
title.
ggplot(mpg, aes(x = displ, y = cty)) +
geom_point(color = 'slateblue')+
theme_bw()+
labs(x = 'Engine Size', y = 'City Gas Mileage', title = "Gas mileage decreases with engine size")