Set up

In this practice, we will work with a dataset that comes with ggplot2 package.

To load this data, run the following code.

library(ggplot2)
data(mpg)

You can now see the top few lines using the head() function

head(mpg)
## # A tibble: 6 × 11
##   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
##   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
## 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
## 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
## 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
## 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
## 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
## 6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa…

This dataset include fuel economy data for 38 different car models in 1999 and 2008. We will work with this dataset in the following this practice set.

Evaluate the data frame.

In this activity, I will provide hints throughout. Only use the hints if you need them! Try to see what you can do without them first.

  1. What are the column names in the dataset?
    Hint Use the function colnames()
    Answer
     colnames(mpg)
  2. How many rows are in the dataset?
    Hint Use the function nrow()
    Answer
     nrow(mpg)
  3. What is the average highway mpg for the entire dataset?
    Hint You will have to look at just the highway hwy column using the $ sign. The function for average is mean()
    Answer
     mean(mpg$hwy)
  4. What is the minimum city mpg for the entire dataset?
    Hint You will have to select just the city mpg cty column using the $ sign. The function for average is min()
    Answer
     min(mpg$cty)

dplyr functions

  1. What are the different car classes that are in the dataset?

    Hint1

    You will need to load in the dplyr package before you can use any of the functions. library(dplyr)

    Hint2

    You can use the distinct function on the column class.

    Answer

     mpg |>
       distinct(class)
  2. Select the columns for just the manufacturer, model, year, and highway mpg.

    Hint

    Use the select() function and type the column names exactly as they appear in colnames()

    Answer

     mpg |>
       select(manufacturer, model, year, hwy)
  3. How many of the cars in the dataset are suvs?

    Hint

    You need use the filter() function to show only suvs from the class column. You can also add the nrow() function to get the number of rows.

    Answer

     mpg |>
       filter(class == 'suv')
    
     mpg |>
       filter(class == 'suv')|>
       nrow()
  4. How many Hondas were there in 2008?

    Hint

    You will have to filter for both honda and 2008.

    Answer

     mpg|>
       filter(manufacturer== 'honda', year == '2008')
  5. How many Hondas or Toyotas were there in 1999?

    Hint

    You will have to filter for honda or toyota using the “|” symbol between your filtering for manufacturer and filter the year for 1999.

    Answer

     mpg |>
       filter(manufacturer == 'honda' | manufacturer == 'toyota', year == 1999)
  6. What car has the lowest highway mpg?

    Hint

    Use the arrange() function on the hwy column.

    Answer

     mpg |>
       arrange(hwy)
  7. What car has the highest city mpg?

    Hint

    You will arrange() based on the cty column. To go from greatest to least you can put a - sign in front of the column.

    You can also add head() to just show the first few rows

    Answer

     mpg |>
       arrange(-cty)
    
     mpg |>
       arrange(-cty) |>
       head()
  8. What car had the highest highway mpg in 1999?

    Hint

    You will arrange() based on the hwy column. To go from greatest to least you can put a - sign in front of the column. You also need to filter() for 1999.

    Answer

       mpg |>
       arrange(-hwy) |>
       filter(year == 1999)
  9. Mutate the table to make a new column with the difference of highway mpg and city mpg.

    Hint

    You will use the mutate() function which expects a new column name = the calculation from the columns in the data frame. i.e. mutate(NEWCOL = COL1 + COL2).

    Answer

     mpg |>
       mutate(diff = hwy - cty)

ggplot graphing

  1. Make a histogram of city mpg.
    Hint Set up the plot with the following format ggplot(data, aes (x = ). Then add the geom_histogram() layer.
    Answer
     ggplot(mpg, aes(x = cty)) +
       geom_histogram()
  2. Make a histogram of highway mpg.
    Hint Change your previous code to have the highway hwy column instead of cty
    Answer
     ggplot(mpg, aes(x = hwy)) +
       geom_histogram()
  3. Make a boxplot of city mpg (y axis) for each class (x axis)
    Hint Set up the plot with the following format ggplot(data, aes (x = , y = ). Then add the geom_boxplot() layer.
    Answer
     ggplot(mpg, aes(x = class, y = cty)) +
       geom_boxplot()
  4. Make a boxplot of highway mpg for each manufacturer.
    Hint Change the y axis to hwy instead of cty.
    Answer
     ggplot(mpg, aes(x = class, y = hwy)) +
       geom_boxplot()
  5. Make a scatterplot for the highway mpg (y axis) and class (x axis). Change the color of the points so that each manufacturer is a different color.
    Hint2 To change the color, you must specify color = inside and aes() function in geom_point. The color should = the column you want to color by.
    Answer
     ggplot(mpg, aes(x = class, y = hwy)) +
       geom_point(aes(color = manufacturer))
  6. Make a scatterplot of the relationship between the size of the engine (the displ column) on the x axis and the city gas mileage on the y axis.
    Hint Set up the plot in the same was as the last plot, changing the columns for the x and y axis to displ and cty, respectively
    Answer
     ggplot(mpg, aes(x = displ, y = cty)) +
       geom_point()
  7. Using the previous scatterplot, change the color of the points to whatever color you prefer.
    Hint Using the last plot, add a color = into the geom_point function. Change it to any color you would like.
    Answer
     ggplot(mpg, aes(x = displ, y = cty)) +
       geom_point(color = 'slateblue')
  8. Using the previous scatterplot, add a theme.
    Hint Using the last plot, add any theme_* that you prefer. For example, theme_minimal().
    Answer
     ggplot(mpg, aes(x = displ, y = cty)) +
       geom_point(color = 'slateblue') +
       theme_bw()
  9. Using the previous scatterplot, change the x and y axis labels
    Hint Using the last plot, use the function labs() to add an x and y label. Label names should be in quotations.
    Answer
     ggplot(mpg, aes(x = displ, y = cty)) +
       geom_point(color = 'slateblue')+
       theme_bw()+
       labs(x = 'Engine Size', y = 'City Gas Mileage')
  10. Using the previous scatterplot, add a title.
    Hint Using the last plot, use the function labs() to add a title.
    Answer
     ggplot(mpg, aes(x = displ, y = cty)) +
       geom_point(color = 'slateblue')+
       theme_bw()+
       labs(x = 'Engine Size', y = 'City Gas Mileage', title = "Gas mileage decreases with engine size")